[00:01] *** metal_cam has quit IRC (Read error: Operation timed out) [00:03] *** balrog has quit IRC (Read error: Operation timed out) [00:08] *** balrog has joined #archiveteam-bs [00:08] *** swebb sets mode: +o balrog [00:24] *** metalcamp has quit IRC (Ping timeout: 506 seconds) [01:13] *** BlueMaxim has joined #archiveteam-bs [02:32] *** JesseW has joined #archiveteam-bs [03:01] hook54321: isn't the remote server the one controlling access? how would expect to just bypass that? [03:02] I would assume that's how it works. I'm not really sure though. [03:03] yes. it would be hilarious if the client were responsible for enforcing this. [03:04] I think some of the teachers have access to the network drives... [03:05] I would recommend not trying to get expelled, if you want to graduate from wherever you're at [03:05] I'm not trying to [03:06] What if I asked some teachers if they could copy the stuff onto an external drive for me? [03:06] sounds like it could work [03:07] I kinda doubt they would do it though [03:07] Well, I know one teacher pretty well [03:45] *** ndiddy has quit IRC (Read error: Connection reset by peer) [04:16] Someone compiled a list of every open FTP server on the IPv4 Internet: https://github.com/massivedynamic/openftp4 [04:17] hm, we probably want those for the FTP project [04:51] *** Stiletto has joined #archiveteam-bs [04:52] *** JesseW has quit IRC (Ping timeout: 370 seconds) [04:53] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [04:59] *** Sk1d has joined #archiveteam-bs [05:23] *** Frogging has quit IRC (El Psy Kongroo!) [05:26] *** RichardG_ has joined #archiveteam-bs [05:26] *** RichardG has quit IRC (Read error: Connection reset by peer) [05:30] *** Frogging has joined #archiveteam-bs [06:01] *** GE has joined #archiveteam-bs [06:39] *** GE has quit IRC (Ping timeout: 255 seconds) [07:11] I tried doing something like that once. Never again. [07:16] *** metalcamp has joined #archiveteam-bs [07:25] HCross2: why? what happened? [07:25] MrRadar: wow, I keep forgetting stuff like this is feasible nowadays [07:26] now I want to scan for every MUD/MUSH server [07:26] Well. When you contact port 21 on every IP on the internet. You hit Government ranges. Governments don't like that [07:26] Ah, right. And they don't always run on predictable ports either. [07:27] If I can remember right, the MoD got in contact [07:28] Someone once did something like that except for Minecraft servers, I think they received a cease and desist notice. [07:28] scanning the Entire internet is serious business, I guess [07:31] there was a def con talk on that [07:31] What's the difference between the ftp project and ArchiveBot? [07:32] they're not related at all [07:32] ArchiveBot is a service [07:32] Isn't ArchiveBot compatible with ftp though? Why don't they just use that? [07:32] ftp project is a regular project [07:32] because slamming terabytes of FTP through a few pipelines is not a good idea [07:33] Ah [07:33] there are other ways to get archives of FTP sites than download them [07:33] single ftp servers can be archived by archivebot [07:33] There are? [07:34] https://www.youtube.com/watch?v=UOWexFaRylM [07:34] Massscanning the Internet - Defcon 22 (2014) [07:34] yes? downloading is one option but for problematic sites you also have the option of writing to the maintainer [07:34] hook54321: archivebot pipelines are a finite resource [07:34] and as far as FTP is concerned, a file mirror is a pretty good mirror [07:34] HCross2: I know [07:35] How is the FTP stuff handled if the maintainer replies? [07:36] I don't know, I'm throwing it out there as a plan B [07:36] there is of course a warrior project for the FTP grab [07:38] *** schbirid has joined #archiveteam-bs [07:39] *** GLaDOS has quit IRC (Oh crap, I died.) [07:40] *** GLaDOS has joined #archiveteam-bs [08:28] *** PurpleSym sets mode: +o arkiver [08:29] *** PurpleSym sets mode: +o midas [08:40] *** odemg has joined #archiveteam-bs [08:42] you guys may want to go after this: https://www.youtube.com/user/UCBerkeley/playlists [08:42] http://news.berkeley.edu/2016/09/13/a-statement-on-online-course-content-and-accessibility/ [08:47] I honestly think they should be contacted, if they're as committed to keep the content available as they claim, they should be willing to give everything out and have it on IA [09:10] *** odemg has quit IRC (Quit: Leaving) [09:26] *** brayden has quit IRC (Ping timeout: 633 seconds) [09:28] huh, Chrome 52 offers MIDI control permissions for websites [09:28] does this mean I can interact with websites via my Launchpad, because if so that is stupendously badass [09:30] *** brayden has joined #archiveteam-bs [09:30] *** swebb sets mode: +o brayden [09:30] oh there's an entire subsection of WebAudio about this, nice [09:36] *** GE has joined #archiveteam-bs [10:03] *** VADemon has joined #archiveteam-bs [10:23] *** godane has quit IRC (Read error: Operation timed out) [10:26] *** godane has joined #archiveteam-bs [10:45] wtf, the first video of the compute rscience playlist was webm [10:58] *** dashcloud has quit IRC (Read error: Connection reset by peer) [10:58] *** dashcloud has joined #archiveteam-bs [10:59] *** GE has quit IRC (Remote host closed the connection) [11:23] *** dashcloud has quit IRC (Read error: Operation timed out) [11:27] *** dashcloud has joined #archiveteam-bs [11:28] wow, i'm so glad youtube-dl has the -U option [11:28] couldn't get twitch streams until i did it and bam, it's working [11:37] IA is totally fine uploading 15GB videos, right? [11:37] the roguelike conference was yesterday and i'm downloading the twitch streams to upload to IA becaue i don't trust twitch [11:39] *** GE has joined #archiveteam-bs [11:40] Yeah [11:40] ok, cool [11:40] wow, i REALLY don't like twitch, lol [11:40] have to download a 300KB .part file for each chunk, then combine them all later into the video. taking forever with a 12.5GB file, lol [11:41] Yeahhhhh [11:41] Twitch [11:48] twitch was a fun project. [11:56] wow, they killed my download, lol [11:56] i was bouncing between 150 and 1.5MB/s [11:56] now i'm bouncing between 5MB-7MB/s [11:56] glad it resumes, too [11:56] hopefully they're not sending me fake data [11:57] i should've probably limited my speed to a "normal" streamer [11:57] I was under the impression Twitch's CDN limited download speed anyway? [12:00] oh, maybe so [12:00] my download maxes out around 9MB/s, so i'm almost hitting it. just odd that i was only hitting 1.5MB/s for 20 minutes, it dies, and now i'm getting 7.5MB/s [12:23] *** fie_ has joined #archiveteam-bs [12:25] *** fie__ has quit IRC (Ping timeout: 244 seconds) [12:26] *** BlueMaxim has quit IRC (Quit: Leaving) [12:50] so many pieces that windows is choking to even show the directory now [12:50] 14.5k+ pieces it's downloading to recombine, lol [13:12] atrocity: I assume it server-side caching and optimisation of some sort [13:13] https://www.ets.berkeley.edu/news/fall-2015-changes-course-capture-webcast-service :/ [13:16] *** GE has quit IRC (Remote host closed the connection) [13:16] joepie91: that makes no sense [13:16] they just admitting they're going to do it for their students, but because of costs, not to the world [13:16] the cost is already done if they're doing it for their students [13:22] atrocity: it was hosted on YT so what costs did they really have? [14:02] *** fie_ has quit IRC (Quit: Leaving) [14:02] exactly what i mean, lol [14:03] *** fie has joined #archiveteam-bs [14:17] the amount of parts a twitch vod has is roughly $duration_milliseconds/4000 [14:24] or in other words, one part per four seconds [14:29] *** brayden has quit IRC (Read error: Operation timed out) [14:31] https://torrentfreak.com/elsevier-wants-cloudflare-to-expose-pirate-sites-160917/ [14:31] "In the ongoing copyright infringement lawsuit against alleged pirate sites Sci-Hub, Libgen and Bookfi, academic publisher Elsevier wants help from Cloudflare. The publisher informs the court that a subpoena against Cloudflare is needed to expose the personal details of the sites' owners." [14:34] "In addition to contacting Cloudflare, the academic publisher also requested information from Whois Privacy Corp. – the domain registration anonymization service used by both Libgen.org and Bookfi.org – but the company hasn’t responded to these requests at all." [14:34] apparently internet.bs' WHOIS privacy thing is giving them the runaround, heh [14:43] *** GE has joined #archiveteam-bs [14:47] it begins [14:47] grrrrr [15:30] I just noticed archive.org refuses to show nifty homepages entirely [15:31] "this URL has been excluded" [15:31] does that mean robots.txt? because that doesn't ban everything: http://homepage2.nifty.com/robots.txt [15:34] *** dashcloud has quit IRC (Remote host closed the connection) [15:40] *** dashcloud has joined #archiveteam-bs [15:51] *** Aranje has joined #archiveteam-bs [16:22] *** brayden has joined #archiveteam-bs [16:22] *** swebb sets mode: +o brayden [16:29] *** RichardG_ has quit IRC (Read error: Connection reset by peer) [16:29] *** RichardG has joined #archiveteam-bs [16:47] It's because their robots.txt parser is stupid and they don't seem to care [17:34] *** ndiddy has joined #archiveteam-bs [18:29] looks like it's not blocked because of robots.txt [18:30] else the wayback machine would have said something about robots.txt [18:33] *** JesseW has joined #archiveteam-bs [18:45] *** tomwsmf_ has joined #archiveteam-bs [19:18] *** schbirid has quit IRC (Quit: Leaving) [19:19] *** tomwsmf_ has quit IRC (Read error: Operation timed out) [19:33] i have a question about that [19:34] has anyone else had a site that they wanted to look at in the internet archive but the site went down and was replaced with a parked page that has a robots.txt file but the original didn't but because the parked page does you can't see any backups of the page [19:34] it's pretty annoying [19:34] happens all the time [19:34] known issue [19:36] also, parked pages are pretty annoying in general tbh [19:36] your point? [19:36] nobody's going to pay $500 for a domain [19:40] *** dashcloud has quit IRC (Read error: Operation timed out) [19:42] The only joy I ever got out of parked websites was why they all had that same picture of that girl [19:42] now it's all pictures of granite and shit [19:42] fun fact: i was looking up what sunsoft was up to a few days ago and they sold their us domain http://sunsoftgames.com/ [19:43] i sent sunsoft japan an email about it and they didn't reply yet [19:43] My mind just had a thought about what the internet has done to that image and I regret going there [19:44] *** godane has quit IRC (Quit: Leaving.) [19:44] *** godane has joined #archiveteam-bs [19:45] *** dashcloud has joined #archiveteam-bs [19:45] *** ndizzle has joined #archiveteam-bs [19:52] *** ndiddy has quit IRC (Read error: Operation timed out) [19:53] *** ndizzle is now known as ndiddy [20:04] *** ndiddy has quit IRC (Read error: Connection reset by peer) [20:06] *** ndiddy has joined #archiveteam-bs [20:08] *** ndizzle has joined #archiveteam-bs [20:16] *** ndiddy has quit IRC (Read error: Operation timed out) [20:17] *** ndizzle has quit IRC (Read error: Operation timed out) [20:17] *** ndiddy has joined #archiveteam-bs [20:31] *** JesseW has quit IRC (Quit: Leaving.) [20:32] *** JesseW has joined #archiveteam-bs [20:39] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [20:39] *** BartoCH has joined #archiveteam-bs [20:41] *** JesseW has quit IRC (Ping timeout: 370 seconds) [20:56] PurpleSym: having a look at the WARC now [21:02] sorry for the delay :/ [21:02] flickr put up a new version, going to check if everything still works [21:02] if it does, I'm going to start the test grab [21:27] *** ravetcofx has quit IRC (Read error: Operation timed out) [21:30] *** metalcamp has quit IRC (Read error: Operation timed out) [21:39] *** ravetcofx has joined #archiveteam-bs [21:51] *** dashcloud has quit IRC (Remote host closed the connection) [21:54] *** dashcloud has joined #archiveteam-bs [22:01] *** VADemon has quit IRC (Quit: left4dead) [22:49] you can actually fight for a domain if somebody is using it for ads [22:49] ppl apparently get sued over that [22:49] i used to own a reddit "typo" domain and made some money off of it years ago and didn't renew it for that reason, lol [23:01] *** JesseW has joined #archiveteam-bs [23:07] *** GE has quit IRC (Quit: zzz) [23:15] *** tomwsmf_ has joined #archiveteam-bs [23:37] *** kristian_ has joined #archiveteam-bs [23:44] "nobody's going to pay $500 for a domain" lol. just lol. sorry. [23:45] I have a feeling like that industry survives due to people with more money than sense tbh [23:46] just need a few of those very high margin sales of not-intrinsically-valuable items profitable [23:46] :p [23:46] I'm just speculating. the only thing I know for sure is that it's cancer [23:47] to be profitable* [23:47] where's my brain [23:48] "intrinsically valuable" is not really a thing that means anything [23:48] (is land at location X intrinsically valuable? gold (at market prices)?) [23:49] fair enough, since value is indeed determined by how much people will pay for it. I guess my point was that very few people are willing to pay that, but enough are to make it worthwhile [23:53] PurpleSym: the resource records look good [23:53] I'm only not sure about the WARC-Target-URI. [23:54] Maybe this should be the location on your machine, or the exact URL you got this synced from to your machine