[00:20] *** zerkalo has joined #archiveteam-bs [00:27] pizzaiolo: Checked, it's already there. [00:28] We might want to consider doing it soon though [00:29] *** nickname_ has quit IRC (Read error: Operation timed out) [00:32] *** espes__ has joined #archiveteam-bs [00:47] *** odemg has joined #archiveteam-bs [00:54] *** odemg has quit IRC (Remote host closed the connection) [01:02] *** godane has left [01:21] *** odemg has joined #archiveteam-bs [01:44] *** BlueMaxim has joined #archiveteam-bs [01:49] *** Darkstar has quit IRC (Ping timeout: 506 seconds) [01:58] *** icedice has quit IRC (Quit: Leaving) [02:09] *** kristian_ has quit IRC (Quit: Leaving) [02:16] *** vitzli has joined #archiveteam-bs [02:18] *** yan has quit IRC (Read error: Operation timed out) [02:28] *** nickname_ has joined #archiveteam-bs [02:29] *** Darkstar has joined #archiveteam-bs [02:40] *** schbirid2 has joined #archiveteam-bs [02:43] *** username1 has quit IRC (Read error: Operation timed out) [03:01] *** zhongfu has quit IRC (Ping timeout: 260 seconds) [03:04] *** pizzaiolo has left [03:08] *** godane has joined #archiveteam-bs [03:08] looks like ftp://aftp.cmdl.noaa.gov/ is gone [03:09] welp [03:09] glad we did that one first [03:12] nice [03:20] *** vitzli has quit IRC (Quit: Leaving) [03:27] *** Asparagir has quit IRC (Read error: Operation timed out) [03:28] *** Asparagir has joined #archiveteam-bs [04:23] *** nickname_ has quit IRC (Read error: Operation timed out) [04:25] *** nickname_ has joined #archiveteam-bs [04:32] *** Stiletto has quit IRC (Read error: Operation timed out) [04:32] *** Stil3tt0 has joined #archiveteam-bs [05:00] *** nickname_ has quit IRC (Read error: Operation timed out) [05:12] *** ndizzle has joined #archiveteam-bs [05:18] *** Somebody2 has quit IRC (Read error: Operation timed out) [05:19] *** ndiddy has quit IRC (Read error: Operation timed out) [05:35] *** Somebody2 has joined #archiveteam-bs [05:45] *** ndiddy has joined #archiveteam-bs [05:45] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [05:48] *** ndizzle has quit IRC (Ping timeout: 244 seconds) [05:50] i'm uploading rev copies of UN Daily Radio [05:51] they do revison copies sometimes to update there radio program [05:52] *** Sk1d has joined #archiveteam-bs [06:17] *** ravetcofx has joined #archiveteam-bs [06:31] *** Honno has joined #archiveteam-bs [07:06] looks like ftp://aftp.cmdl.noaa.gov/ is gone <- gone? the appeal for backing it up was on reddit maybe 10 hours ago. i think everyone from /r/datahoarders tried to wget it at once [07:09] oh, that may not have been a good idea [07:15] *** Aranje has quit IRC (Quit: Three sheets to the wind) [07:18] do we have a complete copy of it? iirc it was started archiving in late december? [07:18] so we should have a copy of it [07:23] *** Stil3tt0 is now known as Stiletto [07:24] *** pikhq has quit IRC (Read error: Operation timed out) [07:30] *** pikhq has joined #archiveteam-bs [07:40] *** atomicthu has quit IRC (hub.dk irc.homelien.no) [07:40] *** PotcFdk has quit IRC (hub.dk irc.homelien.no) [07:40] *** alfie has quit IRC (hub.dk irc.homelien.no) [08:11] *** alfie has joined #archiveteam-bs [08:21] *** GE has joined #archiveteam-bs [08:45] https://archive.org/details/archiveteam_ftpgov?sort=-publicdate&and[]=aftp.cmdl.noaa.gov [08:45] We have a lot of it. arkiver will know [08:47] I think they DDOSed it [08:48] root@teamarchive0:/0/GODANERATOR# ftp aftp.cmdl.noaa.gov [08:48] Connected to aftp.cmdl.noaa.gov. [08:48] 421 There are too many connected users, please try later. [08:48] ftp> [08:53] *** zhongfu has joined #archiveteam-bs [09:25] *** bwn has quit IRC (Read error: Operation timed out) [09:26] *** ravetcofx has quit IRC (Read error: Operation timed out) [09:35] SketchCow: well, based on http://www.sciencemag.org/news/2017/01/trump-officials-suspend-plan-delete-epa-climate-web-page?utm_source=newsfromscience&utm_medium=facebook-text&utm_campaign=suspendepa-10685 i don't know if it will go down anytime soon, so maybe wait for people to get bored of archiving, or ask people on reddit to nicely stop hammering the server especially with multiple connections [09:36] https://www.reddit.com/r/DataHoarder/comments/5q4xxe/erik_fichtner_on_twitter_please_wget_m_np/?utm_content=comments&utm_medium=hot&utm_source=reddit&utm_name=DataHoarder is the post about it [09:36] but its spawned by a tweet, not from reddit itself [09:36] hammering the server into oblivion just means nobody gets the data [09:36] arkiver: ^ [09:37] https://www.reddit.com/r/DataHoarder/comments/5q4xxe/erik_fichtner_on_twitter_please_wget_m_np/dcwurts/ might be a good person to contact if they aren't already in here [09:38] since that might be everything, unless they're actively mirroring more [09:38] also someone on news.ycombinator.com thread about this [09:39] https://news.ycombinator.com/item?id=13487843 [09:39] said they have an internet2 campus link pulling the data as well [09:48] This is a lot of talking for a simple thing [09:48] *** GE has quit IRC (Remote host closed the connection) [09:54] SketchCow: ok, do we have a plan? we can continue probing the ftp periodically until a slot opens up? [10:13] *** antomatic has quit IRC (Read error: Operation timed out) [10:14] *** antomatic has joined #archiveteam-bs [10:14] *** swebb sets mode: +o antomatic [10:15] *** Coderjoe has quit IRC (Read error: Operation timed out) [10:22] *** Coderjoe has joined #archiveteam-bs [11:02] *** kniffy has quit IRC (Ping timeout: 240 seconds) [11:10] *** kniffy has joined #archiveteam-bs [11:32] *** GE has joined #archiveteam-bs [12:22] *** SadDM has joined #archiveteam-bs [12:22] *** swebb sets mode: +o SadDM [12:22] *** tychot has quit IRC (Ping timeout: 245 seconds) [12:24] *** BlueMaxim has quit IRC (Read error: Operation timed out) [12:27] *** pizzaiolo has joined #archiveteam-bs [12:37] *** tychot has joined #archiveteam-bs [12:45] any particular reason archiveteam.org doesn't have HTTPS? [12:46] *** odemg has quit IRC (Remote host closed the connection) [12:51] *** bwn has joined #archiveteam-bs [12:53] *** Honno has quit IRC (Ping timeout: 370 seconds) [12:59] *** Simpbrain has quit IRC (Remote host closed the connection) [14:08] *** Honno has joined #archiveteam-bs [14:18] *** odemg has joined #archiveteam-bs [14:31] *** atomicthu has joined #archiveteam-bs [14:31] *** PotcFdk has joined #archiveteam-bs [14:53] *** vitzli has joined #archiveteam-bs [14:56] *** Honno has quit IRC (Ping timeout: 370 seconds) [16:06] because nobody has made it happen yet [16:07] like with most things in archiveteam that are reasonable ideas, nobody's gotten a bee up their ass to do it yet [16:37] xmc: who's hosting the wiki? [16:38] s/archiveteam/any volunteer efforts/ [16:44] *** kniffy has quit IRC (Ping timeout: 240 seconds) [16:49] *** kniffy has joined #archiveteam-bs [17:11] i used caddy for the first time today, how AWESOME [17:22] pizzaiolo: why would you need https to access a wiki [17:22] "oh no, attackers can spy on my wiki edits!" [17:24] lol [17:25] privacy activists would disagree but yeah I don't think it matters much [17:25] as far as i can tell the login form is a frame to an https page so there shouldn't be any security issues [17:25] also I forget who hosts the wiki [17:26] if it's the AT wiki, there's no HTTPS at any phase, login or otherwise [17:26] yolo [17:26] there are valid reasons for wanting HTTPS to cover all requests beyond the initial login [17:27] see e.g. technique popularized by firesheep, edit integrity [17:27] fortunately these days getting a useful cert isn't too hard what with LE and all [17:27] someone could narf your session key [17:28] or just your password *shrug* [17:28] yes, that is the technique popularized by firesheep [17:28] HTTPS isn't an unreasonable request, it just hasn't really been on the priority list [17:33] also haha I like the way Intel formatted this [17:33] https://gitlab.peach-bun.com/yipdw/random-images/uploads/298582b3b25e313b88950a27915b9e5f/intel6.png [17:33] "Buy Composer Edition! IT INCLUDES NOTHING" [17:33] lmao [17:33] yeah, you have the "All Editions Feature" subheading on the other side [17:33] Improved buying experience [17:33] Both Firefox and Chrome are planning to mark non-secure pages with password fields as explicitly non-secure. It may be a good idea to roll out HTTPS at least for the login page before then (though if you go that far you may as well go all the way) [17:34] I dunno, I guess if the idea is to make people go for the Cluster Edition, it works [17:38] Is there some sort of filesystem container that compresses the contents automatically? My IRC logs are large but easily compressible, it'd be cool if I could put them in their own filesystem that compresses them transparently [17:38] And still be easily greppable because it's transparent [17:38] On Windows you can enable NTFS compression on a per-folder basis [17:38] *** hook54321 sets mode: +o Asparagir [17:38] linux [17:38] There's probably some FUSE file system that would work [17:40] ftp://aftp.cmdl.noaa.gov/ is up for me [17:40] must have been a reddit hug of death then [17:41] yeah [17:41] it won't work when 100 people go after the same file [17:42] that's why we have the warrior. [17:42] btrfs does transparent compression via lzo [17:42] * arkiver is afk for a bit [17:42] I dunno if you want to switch to btrfs though [17:42] when I'm back I'll check if we have all of aftp [17:42] btrfs will be great when it is stable. [17:42] not that it's necessarily unstable (I have no idea), but switching filesystems is generally just a pain in the ass no matter what the target [17:43] *** n00b709 has joined #archiveteam-bs [17:43] yipdw: it'd just be a mounted image that sits inside my normal filesystem, [17:44] I guess that works, yeah [17:44] *** n00b709 has left [17:45] I never tried using different filesystems for LVM volume groups [17:45] maybe I should see if that works [17:45] you know, the next time I decide "Wow, I have nothing better to do than destroy my computer" [17:47] "I will just do this one thing, how bad can it be? There is even a tutorial!" -> 5 hours later -> "Okay, I have managed to reflash by bios and unbrick my system. Let's never do that again." -> Repeat. [17:47] https://xkcd.com/349/ [17:48] yeah, basically [17:48] although, to be fair, I have had very good experiences with LVM [17:48] it was very handy when I was moving from 2 80 GB SATA SSDs to a 400 GB PCIe [17:48] add the new drive to a volume group, copy the two SSD VGs to the new one [17:48] done [17:49] I was surprised. I was expecting to have to finagle filesystem arcana or some shit and I was disappointed, in a way, that I didn't, because it meant I couldn't make snide "In 2016, ..." tweets [17:50] *** jrwr has joined #archiveteam-bs [17:59] *** kurt|rbx1 has quit IRC (Ping timeout: 260 seconds) [18:02] *** Honno has joined #archiveteam-bs [18:12] Frogging: ZFS does compression [18:19] *** vitzli has quit IRC (Quit: Leaving) [18:20] *** odemg has quit IRC (Remote host closed the connection) [18:36] *** kniffy has quit IRC (Ping timeout: 260 seconds) [18:42] *** kniffy has joined #archiveteam-bs [18:42] ZFS with lz4 is better than sliced break. [18:48] *** VADemon has joined #archiveteam-bs [18:59] *** odemg has joined #archiveteam-bs [19:05] *** kniffy has quit IRC (Ping timeout: 240 seconds) [19:06] https://www.reddit.com/r/DataHoarder/comments/5q4xxe/erik_fichtner_on_twitter_please_wget_m_np/dcxq9n9/ is worrying [19:06] judging by the warrior stats we only have at most 14gb from that ftp [19:07] while the person parent post to the one i linked on reddit has 514GB [19:09] SketchCow: maybe ask that /u/fuckoffplsthankyou guy if he can upload the 514GB to an AT machine? [19:11] *** kniffy has joined #archiveteam-bs [19:15] *** merp has joined #archiveteam-bs [19:21] *** merp has left [19:28] *** kniffy has quit IRC (Ping timeout: 240 seconds) [19:33] *** kniffy has joined #archiveteam-bs [19:52] *** Jordan has quit IRC (Read error: Operation timed out) [19:53] *** gourgastl has joined #archiveteam-bs [19:54] *** Jordan has joined #archiveteam-bs [20:03] *** Jordan has quit IRC (Remote host closed the connection) [20:04] *** Jordan has joined #archiveteam-bs [20:23] *** ravetcofx has joined #archiveteam-bs [20:31] *** kevinr has joined #archiveteam-bs [20:41] *** godane has quit IRC (Read error: Operation timed out) [20:48] *** nickname_ has joined #archiveteam-bs [21:24] anybody recommend good bang/$ when it comes to VPS providers to run warrior on. I'm running on DigitalOcean atm with moderate success. [21:26] depending on the budget, but OVH's SoYouStart is pretty much the best you're going to get (not VPSs though, dedis) [21:27] hmpf... might have to pay 15% sales tax on that since I'm Canadian, but 2GB for $7/mo is very competative. [21:29] *** gourgastl has quit IRC (Quit: Page closed) [21:31] oops, that was the Kimsufi line [21:33] RAM isn't everything though, those Atom processors will be terrible for, well, pretty much everything [21:35] ahaha yah, but the scripts are mostly I/O bound, no? The load wouldn't be so crazy as to bottleneck the I/O, would it? [21:36] on those processors I'd say the CPU would be your issue [21:37] depends on the project really. I'm maxing out 8 cores of a Xeon E3 on ftp-gov, but that is excessively heavy on cpu [21:39] ^ I use Scaleway VPS for mine. $10 a month for 6 x x86 cores, 8GB of RAM, 200GB of SSD, 200mbit unmetered. [21:40] *** godane has joined #archiveteam-bs [21:43] rocode: have you tried ftp-gov on that? wondering about performance [21:45] Mine is currently running 2 concurrent of the following projects: wikiteam, urlshort, yuku, pdf, googlecode, ftp, vine, ipernity, yahooanswers, and 5 grab-site grabbers. [21:45] So, I am not running FTPgov, but I don't think it would be a problem. Something caused me not to run it, probably an error of some sort. [21:46] ah [21:46] just wondering as I'm having 'issues' with online/scaleway's network [21:47] Tried switching to the Amsterdam datacenter? [21:47] can't, as it's a dedi through online.net rather than scaleway [21:48] Ah. No clue then. [21:48] I have never had network issues. [21:48] for reference, 107ms from OVH in Roubaix to one of the nasa FTPs. From online in paris I'm seeing an average of 1270ms :( [21:51] Kaz: very simple solution. More ovh :p [21:51] *** Ravenloft has quit IRC (Read error: Connection reset by peer) [21:53] I'm enjoying the £20/mo I'm already saving! [21:53] shame about the 260mbps limit [21:54] id use hetzner in a heartbeat if it wasnt for their bw caps [21:54] turns out we're not actually the ideal customer for any provider, who knew? [21:55] OVH could do away with Kimsufi though, bring the price down for the rest of us [21:55] Kaz, tempted to test servdiscount - however from what I read their network can be slow [21:57] They have a 30day trial/moneyback thing [21:57] it's win/win. The network either works and you keep it, or it's crap so they have no reason not to honor it [21:57] https://servdiscount.com/en/services/payment-methods.html is not that good tho [21:58] ? [21:58] visa/mastercard accepted [21:58] for a 5% fee [21:58] if you scroll down [21:58] ah, yeah they hide that a bit out the way [21:58] Can do SEPA, but my bank charge £4 per payment on it [21:59] although 2 eur on a 40eur server isnt too bad [22:02] http://www.speedtest.net/result/4242574225.png [22:02] admittedly that's old [22:02] wonderful - the order form has suddenly gone all german on me [22:02] hah [22:03] the url has /en/ in it - yet its gone all German [22:03] https://www.peeringdb.com/net/1007 too - doesn't seem too bad but doubt you'll get anything worthwhile outside of europe [22:03] im pretty amazed with my box in LA. 140ms between my home and the server, RDP still feels like its at OVH in france, and gameservers and stuff run well [22:03] can someone requeue the flickr items or change the warrior default to something else? I'm feeling like it doens't do much right now. [22:06] Kaz, their order form cant take a full UK postcode [22:10] ah [22:14] i've noticed ftp-gov has been heavy on CPU as well Kaz, any idea why? Also, why does the item have to fit in memory? [22:14] if it's a matter of having to send it off to the AT servers, then we should be able to read and send in chunks, no? [22:16] honestly no idea, I haven't looked much into what the scripts are actually doing other than pointing wpull at some ftp sites [22:17] the ftp-gov scripts use wpull's on-disk URL database; if there's a memory limit somewhere, at least it isn't that [22:20] k [22:21] as for high CPU, we notice the same phenomenon in archivebot, which is also using wpull [22:21] the cause is presently unknown because we haven't profiled yet [22:22] in archivebot's case it's possible it's not wpull proper but rather one of our plugins [22:22] etc. [22:23] yipdw, are you taking pipeline applications? [22:23] maybe soon. I want to keep watching the new pipeline code for a bit [22:24] coolio... I've been getting a lot of memory surprises with long-running python scripts these days myself [22:24] one time it was a memory leak from a c library interface, the other time it was one of those weird subotimal GC cases [22:25] we usually don't hit memory problems in pipelines [22:25] although sometimes it happens [22:25] it's odd [22:25] i'd really love a visualvm equivalent for CPython [22:26] there are some weird profiling tools for python, but it's all opaque to me [22:28] there are a few but the ones I've seen give you a flat or graph profile at the end of some block, or at process termination. i want the ability (either by explicit instrumentation or VM hooks) to watch the process as it runs [22:28] there's so much information you can get about process behavior by watching it over time vs. trying to reconstruct that history from a profiler snapshot. it's strange that this isn't more common in profiler land [22:29] like offhand there's visualvm, instruments, telemetry, uh [22:29] vtune maybe [22:42] yipdw: you may have seen this already, but just in case you haven't, is this helpful: http://www.brendangregg.com/blog/2016-10-27/dtrace-for-linux-2016.html (the person who worked a lot on bpf) [22:44] dtrace is pretty nice and there is some work on integrating dtrace probes into CPython, yeah [22:44] I hope that takes off [22:44] I didn't know DTrace made it into Linux [22:45] oh, wait, it didn't [22:45] well maybe one day [22:56] *** odemg has quit IRC (Remote host closed the connection) [23:02] *** GE has quit IRC (Quit: zzz) [23:05] *** Honno has quit IRC (Ping timeout: 370 seconds) [23:20] *** odemg has joined #archiveteam-bs [23:23] from the post, it sounds like bpf offers everything dtrace does except the vast library of pre-existing probes [23:25] *** Stiletto has quit IRC (Read error: Operation timed out) [23:26] *** dashcloud has quit IRC (Read error: Operation timed out) [23:32] *** dashcloud has joined #archiveteam-bs [23:37] *** BlueMaxim has joined #archiveteam-bs [23:55] *** Stil3tt0 has joined #archiveteam-bs