[00:00] *** goekesmi has joined #archiveteam-bs [00:04] *** GLaDOS has quit IRC (Read error: Operation timed out) [00:05] *** GLaDOS has joined #archiveteam-bs [00:20] *** primus104 has quit IRC (Leaving.) [00:27] *** mistym has quit IRC (Remote host closed the connection) [00:34] *** SN4T14 has joined #archiveteam-bs [00:35] *** thechip__ has joined #archiveteam-bs [00:35] *** thechip_ has quit IRC (Read error: Operation timed out) [00:36] *** wp494_ has joined #archiveteam-bs [00:39] *** SN4T14__ has quit IRC (Ping timeout: 369 seconds) [00:39] *** wp494 has quit IRC (Read error: Operation timed out) [00:45] *** lytv has quit IRC (Ping timeout: 306 seconds) [00:48] *** lytv has joined #archiveteam-bs [00:48] *** mistym has joined #archiveteam-bs [00:59] *** Ravenloft has joined #archiveteam-bs [01:04] *** wp494_ is now known as wp494 [01:23] *** mutoso has joined #archiveteam-bs [01:59] *** GLaDOS has quit IRC (Read error: Operation timed out) [02:00] *** GLaDOS has joined #archiveteam-bs [02:10] *** GLaDOS has quit IRC (Ping timeout: 260 seconds) [02:11] *** GLaDOS has joined #archiveteam-bs [02:26] *** GLaDOS has quit IRC (Read error: Operation timed out) [02:26] *** GLaDOS has joined #archiveteam-bs [02:39] *** GLaDOS has quit IRC (Read error: Operation timed out) [02:42] *** GLaDOS has joined #archiveteam-bs [02:52] *** GLaDOS has quit IRC (Ping timeout: 260 seconds) [02:55] *** GLaDOS has joined #archiveteam-bs [03:01] *** dan_ has quit IRC (Ping timeout: 260 seconds) [03:06] *** dan_ has joined #archiveteam-bs [03:15] *** SimpBrain has quit IRC (Read error: Connection reset by peer) [03:18] *** Ara_ has joined #archiveteam-bs [03:24] *** brayden has quit IRC (Read error: Operation timed out) [03:32] https://www.youtube.com/watch?v=jKfmHy4n_o4 [03:34] *** mistym has quit IRC (Remote host closed the connection) [03:58] *** mistym has joined #archiveteam-bs [04:00] *** Ara_ has quit IRC (Read error: Connection reset by peer) [04:08] *** aaaaaaaaa has quit IRC (Leaving) [04:42] *** balrog has quit IRC (Ping timeout: 260 seconds) [04:53] *** rejon has joined #archiveteam-bs [04:58] *** balrog has joined #archiveteam-bs [05:19] *** brayden has joined #archiveteam-bs [05:40] so i'm looking at sbs news video files [05:41] based on the way the ids are done i maybe grabbing stuff from 1998 [05:41] but i have no idea [06:10] *** GLaDOS has quit IRC (Read error: Operation timed out) [06:11] *** GLaDOS has joined #archiveteam-bs [06:28] *** wp494 has quit IRC (Ping timeout: 740 seconds) [06:53] *** mistym has quit IRC (Remote host closed the connection) [06:53] *** wp494 has joined #archiveteam-bs [07:22] *** primus104 has joined #archiveteam-bs [07:37] *** Jonimus has quit IRC (Ping timeout: 370 seconds) [07:37] *** dashcloud has quit IRC (Ping timeout: 260 seconds) [07:37] *** dashcloud has joined #archiveteam-bs [08:01] *** vitzli has joined #archiveteam-bs [08:11] hello, once, there was a project called CD3WD (www.cd3wd.com), created by Alex Weir, aka @alexweir1949. D3WD is a collection of high-quality 3rd world development information on agriculture, health, appropriate technology, construction, food processing, crop storage, woodwork, metalwork, electrical trades, education, and computer skills. [08:13] Last tweets and facebook updates from Alex Weir were on 2014-05-12 and website cd3wd.com is unavailable since November-December 2014 and seems to be taken by hosting or somebody else, content is unavailable. BUT, there is a collection of DVD images in torrent files at http://fastspeedtest.net/mirrors/cd3wd/ - would anyone be interested in having a copy? [08:15] *** dashcloud has quit IRC (Read error: Operation timed out) [08:18] *** dashcloud has joined #archiveteam-bs [08:26] *** schbirid has joined #archiveteam-bs [09:42] *** wp494 has quit IRC (ircd.shaw.ca irc.shaw.ca) [09:42] *** Start has quit IRC (ircd.shaw.ca irc.shaw.ca) [09:42] *** SadDM has quit IRC (ircd.shaw.ca irc.shaw.ca) [09:42] *** xtr-201 has quit IRC (ircd.shaw.ca irc.shaw.ca) [09:42] *** garyrh has quit IRC (ircd.shaw.ca irc.shaw.ca) [09:42] *** pikhq has quit IRC (ircd.shaw.ca irc.shaw.ca) [09:42] *** useretail has quit IRC (ircd.shaw.ca irc.shaw.ca) [09:42] *** vitzli has quit IRC (Quit: Leaving) [09:42] *** dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.) [09:44] *** dashcloud has joined #archiveteam-bs [10:02] *** wp494 has joined #archiveteam-bs [10:02] *** Start has joined #archiveteam-bs [10:02] *** SadDM has joined #archiveteam-bs [10:02] *** garyrh has joined #archiveteam-bs [10:02] *** pikhq has joined #archiveteam-bs [10:02] *** useretail has joined #archiveteam-bs [10:17] so looks like there are not alot of Led Zeppelin bootlets between 1970-04-19 to 1970-06-27 [10:18] at least from the site i got the urls from [10:29] *** cloudmons has quit IRC (Read error: Connection reset by peer) [10:29] *** cloudmons has joined #archiveteam-bs [10:34] SketchCow: btw another collection: https://archive.org/search.php?query=subject%3A%22The%20Web%20Ahead%22&sort=-publicdate [10:34] its been there of the last week i think [10:36] I saw [10:37] i'm grabbing 98 to 100 right now [11:17] *** w0rp has quit IRC (Ping timeout: 265 seconds) [11:17] *** Ravenloft has quit IRC (Ping timeout: 265 seconds) [11:18] *** lytv has quit IRC (Ping timeout: 265 seconds) [11:18] *** dx- has quit IRC (Ping timeout: 265 seconds) [11:18] *** dx has joined #archiveteam-bs [11:19] *** w0rp has joined #archiveteam-bs [11:20] *** jk[[SVP]] has joined #archiveteam-bs [11:20] *** vitzli has joined #archiveteam-bs [11:21] *** lytv has joined #archiveteam-bs [11:21] *** SketchCo1 has joined #archiveteam-bs [11:21] *** Kazzy_ has joined #archiveteam-bs [11:22] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [11:24] *** Sk2d has joined #archiveteam-bs [11:26] *** Sk1d has quit IRC (hub.se efnet.portlane.se) [11:26] *** jk[SVP] has quit IRC (hub.se efnet.portlane.se) [11:26] *** Gfy has quit IRC (hub.se efnet.portlane.se) [11:26] *** pwnsrv has quit IRC (hub.se efnet.portlane.se) [11:26] *** underscor has quit IRC (hub.se efnet.portlane.se) [11:26] *** Kazzy has quit IRC (hub.se efnet.portlane.se) [11:26] *** SketchCow has quit IRC (hub.se efnet.portlane.se) [11:28] *** Gfy_ has joined #archiveteam-bs [11:42] *** Gfy_ is now known as Gfy [11:42] *** jk[[SVP]] is now known as jk[SVP] [11:42] *** brayden has quit IRC (Quit: Leaving) [11:42] *** underscor has joined #archiveteam-bs [11:42] *** Kazzy_ is now known as Kazzy [11:42] *** Sk2d is now known as Sk1d [11:56] *** yan has joined #archiveteam-bs [12:08] *** primus104 has quit IRC (Leaving.) [12:24] *** dashcloud has quit IRC (Read error: Operation timed out) [12:31] *** dashcloud has joined #archiveteam-bs [12:42] *** sankin has joined #archiveteam-bs [12:49] *** Jonimus has joined #archiveteam-bs [12:52] *** brayden has joined #archiveteam-bs [13:40] *** ionpulse has quit IRC (Read error: Connection reset by peer) [13:46] *** Start has quit IRC (Disconnected.) [13:47] *** ionpulse has joined #archiveteam-bs [13:53] .tw https://twitter.com/apblake/status/581090532539138050 [13:53] Uber's headquarters in the Netherlands is being raided right now by feds (via @joepie91) pic.twitter.com/iORamlZauX (@apblake) [14:32] *** pwnsrv has joined #archiveteam-bs [14:34] *** mistym has joined #archiveteam-bs [14:34] *** Start has joined #archiveteam-bs [14:35] *** Start_ has joined #archiveteam-bs [14:35] *** Start has quit IRC (Read error: Connection reset by peer) [14:39] *** mistym has quit IRC (Remote host closed the connection) [14:43] *** primus104 has joined #archiveteam-bs [14:46] *** Start_ has quit IRC (Read error: Connection reset by peer) [14:46] *** Start has joined #archiveteam-bs [14:56] *** mistym has joined #archiveteam-bs [15:10] *** primus104 has quit IRC (Leaving.) [15:18] *** Start has quit IRC (Disconnected.) [15:25] *** Start has joined #archiveteam-bs [15:26] *** Start has quit IRC (Read error: Connection reset by peer) [15:28] *** Start has joined #archiveteam-bs [15:42] *** dashcloud has quit IRC (Read error: Operation timed out) [15:47] *** dashcloud has joined #archiveteam-bs [15:51] *** garyrh has quit IRC (Remote host closed the connection) [15:51] *** Start has quit IRC (Disconnected.) [16:01] *** vitzli has quit IRC (Quit: Leaving) [16:06] joepie91_: heheheheh [16:28] *** Start has joined #archiveteam-bs [16:29] *** Start_ has joined #archiveteam-bs [16:29] *** Start has quit IRC (Read error: Connection reset by peer) [16:29] *** Start_ is now known as Start [16:32] *** primus104 has joined #archiveteam-bs [16:45] *** Start has quit IRC (Disconnected.) [16:56] *** aaaaaaaaa has joined #archiveteam-bs [17:21] *** mistym has quit IRC (Remote host closed the connection) [17:36] *** mistym has joined #archiveteam-bs [17:43] *** SketchCo1 is now known as SketchCOw [17:43] *** SketchCOw is now known as SketchCow [17:43] Cameron_D: ersi Famicoman raylee - ops [17:50] fuck, i hate it when wget OOMs after days of mirroring [17:50] piece of shit [17:51] https://pbs.twimg.com/media/CBAoaU1UwAIUPIc.jpg [17:56] *** Start has joined #archiveteam-bs [18:04] schbirid: that's one of the big reasons ArchiveBot is now using wpull [18:37] *** Start has quit IRC (Disconnected.) [18:54] yipdw: there's OOM mitigation in wpull now? [18:54] at probably doesnt leak memory like wget (or what's its problem) [18:57] I think wpull uses temporary files, rather than keeping everything in memory. Plus, if it does crash, I think there is some ability to resume a job. [18:59] the wiki could use some updating on manually archiving sites [19:00] a lot of stuff about even wget is outdated, and warcs are told more like a story of "how archiveteam got told by the internet archive" than "MAKE SURE YOU DO THIS." [19:03] *** Start has joined #archiveteam-bs [19:09] > Amazon Cloud Drive Now Includes Unlimited Cloud Storage Plans [19:09] that's going to end well... [19:09] internetarchive.bak is solved, everybody go home [19:10] LOL [19:13] it's "unlimited" because they rate-limit uploads [19:14] oh, so as your drive fills up, your upload slows down? that's a hilarious way to do technically infinite storage [19:15] it's a trickle, but you can keep on uploading forever! [19:16] We may terminate the Agreement or restrict, suspend or terminate your use of the Service at our discretion without notice at any time, including if we determine that your use violates the Agreement, is improper, substantially exceeds or differs from normal use by other users, or otherwise involves fraud or misuse of the Service or harms our interests or those of another user of the Service. [19:17] So no hoarders or people we don't like [19:17] I think it's a constant speed regardless of how much you've used [19:17] * mhazinsk is using ~5 TB of Google Drive's "unlimited" storage and has been getting about 10 Mbps [19:19] >Exceeds normal use by other users [19:20] I see one obvious solution to that bit, sign up our friends and use their accounts too, so everyone uses tons of space [19:20] joepie91_: not explicitly so but we haven't had leakage in recent versions [19:22] wpull still has a few spots of unbounded resource consumption; one of them is the wpull log [19:23] if you don't care about that you can silence it [19:23] I suspect that can be fixed externally however [19:23] add a proper log rotator process etc. [19:24] when making warcs the verbose log goes into the meta-warc so that's a bit trickier [19:24] but seriously that 5tb would be really handy as a staging point for jobs [19:24] does wpull handle incoming gzipped pages for mirroring? [19:25] if you say you accept gzip, wget fails to parse them [19:25] *** Start has quit IRC (Disconnected.) [19:27] Ctrl-S: I just got an 8TB Seagate disk for that purpose [19:27] schbirid: yes, it does [19:27] and I'm getting a NAS which will probably end up with something like 24TB [19:27] pacman -Rsn wget [19:27] depending on how I configure it [19:28] chfoo: any recommendations on archiving ftp? I'm currently using lftp mirror with both debug logging and verbose logging; debug logging logs all the FTP requests and responses [19:32] *** Start has joined #archiveteam-bs [19:33] *** Start_ has joined #archiveteam-bs [19:33] *** Start has quit IRC (Read error: Connection reset by peer) [19:35] balrog: nothing specific that i can recall right now. but if you can find something that lftp can do but wpull can't, it would be helpful to file an issue about it [19:36] *** Start_ is now known as Start [19:45] *** Start_ has joined #archiveteam-bs [19:45] *** Start has quit IRC (Read error: Connection reset by peer) [19:45] *** Start_ is now known as Start [19:47] *** SN4T14_ has joined #archiveteam-bs [19:56] *** SN4T14 has quit IRC (Ping timeout: 512 seconds) [20:22] *** Start has quit IRC (Disconnected.) [20:29] *** mistym has quit IRC (Remote host closed the connection) [20:51] *** mistym has joined #archiveteam-bs [20:59] *** schbirid has quit IRC (Leaving) [20:59] *** sankin has quit IRC (Leaving.) [21:01] *** primus has joined #archiveteam-bs [21:03] *** schbirid has joined #archiveteam-bs [21:04] *** schbirid has quit IRC (Client Quit) [21:10] *** schbirid has joined #archiveteam-bs [21:10] chfoo: how fast does --delete-after=False prune? i do not have enough space [21:11] and does "--recursive --level inf --no-parent=true" equals "-m -np"? [21:13] schbirid: --delete-after is the same as wget, it deletes right after downloading a file. [21:13] oops, i had no idea that existed :) [21:13] thanks [21:14] -m has --no-remove-listing too [21:15] should be good [21:15] i am seeing missing characters of my font in the log [21:15] INFO - Fetched �http://www [21:15] the char before http is a question mark for me [21:16] if you don't have utf-8 support, then it won't show correctly. it's only a single quotation marks around the url [21:18] hm, not very good first impression here [21:18] wpull.url - WARNING - Unable to parse URL �https:/�: Hostname is empty: 'https:/'. [21:18] and eg ERROR Fetching \u2018http://www.spiegel.de/forum/wirtschaft/renditehunger-sat1-streicht-beliebte-nachrichten-%96-ab-sofort-thread-1886-1.html\u2019 encountered an error: Too many redirects. [21:18] oh wait, that second one is correct [21:20] my setup http://pastebin.com/bdhAhhmb [21:23] *** mistym has quit IRC (Remote host closed the connection) [21:23] quite fast though := [21:27] *** yan has quit IRC (Quit: bye) [21:34] *** BlueMaxim has joined #archiveteam-bs [21:38] *** mistym has joined #archiveteam-bs [21:56] i'll see if it melts my vps overnight ;) [21:56] *** schbirid has quit IRC (Leaving) [22:39] *** Start has joined #archiveteam-bs [22:52] https://i.imgur.com/y3K7D9Il.jpg [22:56] schbirid: they're both fine [22:56] oh wait, left [22:56] oh well [23:48] i got my smash bros soundtrack cd! [23:49] but of course nintendo shipped in in cheap plastic wrap so the front of the case is all cracked :( [23:53] smash bros indeed