[00:10] *** MMovie has quit IRC (Read error: Operation timed out) [00:11] *** MMovie has joined #archiveteam [00:12] johtso: ping ivan` about archiving youtube [00:14] If it involves any kind of manual involvement it's probably not practical, it's a 24 hour live stream :) [00:14] I'm sure they'll archive it.. [00:15] johtso, look at using livestreamer and vlc media plater [00:15] player [00:34] *** MMovie has quit IRC (Read error: Operation timed out) [00:35] *** MMovie has joined #archiveteam [00:37] I was wondering what the status of archiving fanfiction.net is? According to http://www.archiveteam.org/index.php?title=FanFiction.Net it's being saved, but I can't find out by who or where to. [00:38] Last September someone scraped every story from it and put it up as a torrent. [00:39] (Though they just saved each one as plain text, so maybe not the best job) [00:39] You can find the magnet link here: https://www.reddit.com/r/DataHoarder/comments/3jl3qm/nearly_complete_archive_of_fanfictionnet/ [00:39] So it's not being actively archived now? [00:40] Not by us, as far as I know [00:43] got tagged as saved [00:44] Hm, probably should be changed to {{partiallysaved}} in that case [00:44] Do you know if anyone tossed the torrent onto IA? [00:44] I did [00:45] https://archive.org/details/fanfiction.net_2015_09 [00:46] ah, cool -- please do add that link to the wiki page [00:48] also, if/when you get a chance, you could turn the link from the item description into a clickable link. [00:49] How do you do that? [00:49] *** ndiddy has quit IRC (Read error: Operation timed out) [00:50] *** MMovie has quit IRC (Read error: Operation timed out) [00:51] *** MMovie has joined #archiveteam [00:52] MrRadar: the description can use HTML [00:52] OK, I didn't know that [00:52] i.e. http://blah.com [00:52] yeah, one of many IA hidden features. :-) [00:52] IDK what sanatizing they do (not much, I'd guess) [00:54] Example (just tested): https://archive.org/details/fav-jesse_w [00:55] It looks like we did a scrape of the site back in 2014. Where did that data end up? [00:55] I'd like to add a link to it too [00:55] *** robink has quit IRC (Ping timeout: 190 seconds) [00:56] Hm, this is ... mildly alarming: https://catalogd.archive.org/log/466976952 [00:56] Apparently the lack of sanatizing they do extends to these pages. :-) [00:56] Oh, yeah, that's bad [00:57] I'd hit info@ [00:57] *** robink has joined #archiveteam [00:57] They probably just need to put an htmlspecialchars() call around the output [01:00] Hmm. It looks like that Fanfiction.net scrape was also uploaded by its original creator. [01:00] *** robink has quit IRC (Ping timeout: 190 seconds) [01:00] Now that I do a search for it [01:01] The only difference is that mine has the original torrent file [01:01] Does the IA dedup by hash? I'd hate to have them storing this data 4 times [01:01] *** robink has joined #archiveteam [01:02] MrRadar: that was me, i think, and it now has an inventory file [01:02] they're not storing it 4 times [01:02] they're storing it 8 times [01:02] at least [01:02] *** robink has quit IRC (Remote host closed the connection) [01:03] it would be interesting to see if any of those copies have the original version of Fifty Shades of Grey [01:03] if none of them do that is a serious mark against all ofu s [01:03] bsmith093, this one: https://archive.org/details/FanfictionNearlyCompleteArchive ? [01:03] MrRadar: yes, thst [01:03] that [01:04] Haha, I should have searched first before uploading it [01:04] Did you create that scrape originally? [01:04] MrRadar: which one did you do [01:04] I uploaded it to https://archive.org/details/fanfiction.net_2015_09 [01:04] MrRadar: using fanficfare running through a list of all id numbers [01:05] OK [01:05] you many want the inventory file to be able to search that [01:05] How should I credit you in my copy of the upload? [01:06] list the other link. it's fine i just used another project somebody else built to scrape all of it [01:07] MrRadar: while we're on the subject, how the hell do i extract one file from this archive. should it be taking forever? [01:08] Unlink zip and 7z files, TAR files don't have a catalog so if you want to extract a file it has to scan through the whole thing [01:08] To find it [01:08] ugh [01:10] MrRadar: if you want to rebuild it into a 7z file, you can, i would really like something i can extract a given file from in less than an hour [01:15] yipdw, the original fifty shades was gone from ff by 2010. Wayback machine might have a copy, but it's robots.txt-excluded. There are still copies of it floating around the web tho, if you know the original title. [01:15] so much for our efforts [01:16] yipdw: if they did'nt throttle so hard, we could have gotten all of it in like a week [01:20] I kid [01:20] it's just that I've encountered situations where it's like "oh I wonder if we got that" and yet in the terabytes we drag in daily [01:20] nope [01:21] this happens sometimes in archivebot crawls [01:21] maybe that's just what happens when you deal with something as ineffably huge as the web [01:21] Updated the Fanfiction.net page with references to the AT's 2012 scrape and bsmith093's 2015 one [01:21] *** dserodio has quit IRC (Read error: Operation timed out) [01:21] If it's any consolation, we likely have good representative samples of the early age of dinosaur erotica, and... whatever the next terrible trend will be. [01:22] ARCHIVE TEAM TRENDSETTIN' 2016 \m/ [01:23] simply amazing [01:23] snape:well, thank FSM we have that! ;) [01:24] *** dserodio has joined #archiveteam [01:27] *** dashcloud has quit IRC (Read error: Operation timed out) [01:27] *** MMovie has quit IRC (Read error: Operation timed out) [01:28] *** MMovie has joined #archiveteam [01:28] OK, reported the lack of escaping to info@ [01:30] *** dashcloud has joined #archiveteam [01:31] JesseW: I checked that log, what was wrong? [01:32] bsmith093: note the 2nd time the description was shown, after [description] => [01:32] It uses the actual HTML, not escaped (as it is above) [01:32] (and again below, by "with value:" [01:33] oh, i see it now [01:33] Also, is there an archive format that stores an index, because apparently tar doesn't [01:35] cdx [01:35] Well, that's for WARCs [01:35] For general files .zip or .7z are the go-to [01:36] is there a thing i can use without having to un- and re-compress 300GB of files? [01:37] Not really. Part of it is also that if your .tar is also gzipped you would need to decompress the entire gzip stream up to each file [01:37] Even if you had an index [01:37] .tar.gz is not designed for random access [01:38] anyone feel like being awesome? when i created that file, i thought it would actually be searchable easily. [01:39] On a related note, I wonder if there's a list somewhere of the oldest continuously-active porn sites. The oldest one I could remember, from 2000, seems to have disappeared sometime last year. :/ [01:41] snape: there may have been such a list on Wikipedia -- although it quite likely has been deleted by now; but if you look through old revisions, you may be able to find it. [01:42] bsmith093: once I get done with the IA census (which should be pretty soon -- mostly just waiting on jjake uploading the results) I'm glad to recompress the fanfiction tarball as a zip. [01:42] I have a good pipe, and enough free space. [01:47] *** dashcloud has quit IRC (Read error: Operation timed out) [01:47] JesseW: thanks so much! BTW when I made the file, every single gui tool was choking on a folder that big, and i didn't actually have the space to sstore the final tar, so i compressed on the fly to fos. when i created the file, apparently i pushed in the whole path of the folder, so when you rebuild it, could you start with the Fanfiction folder, b [01:47] uried in home/Desktop etc. that was my bad. [01:49] yeah [01:49] any way i have plenty of space now, mostly because i finally dumped the uncompressed files, and thats how i got started looking for omething to search a tar file [01:49] I think my debian box should be OK handling it. [01:49] Thank you for babysitting the script to make it! [01:50] *** dashcloud has joined #archiveteam [01:51] MrRadar: I improved the link to the 2012 scrape. [01:51] np, i wasn't doing much anyway! also the inventory file is here, and you'll see the problem immediately https://archive.org/download/FanfictionNearlyCompleteArchive/inventory.txt [01:51] Thanks, JesseW [01:52] argh, the *inventory* is nearly 800MB! [01:53] The pipe I'm on right /now/ isn't so good -- I'll download that later. :-) [01:55] JesseW:hey i had to leave that uncompressed, the whole point is so google can find it. [02:10] *** MMovie has quit IRC (Read error: Operation timed out) [02:12] *** MMovie has joined #archiveteam [02:26] *** bsmith093 has quit IRC (http://www.kiwiirc.com/ - A hand crafted IRC client) [02:35] *** JesseW has quit IRC (Quit: Leaving.) [02:37] *** mafrasi2_ has quit IRC (Read error: Connection reset by peer) [02:38] *** JesseW has joined #archiveteam [02:38] *** mafrasi2 has joined #archiveteam [02:38] *** yipdw_ has joined #archiveteam [02:41] *** yipdw has quit IRC (Ping timeout: 506 seconds) [02:54] *** MMovie has quit IRC (Read error: Operation timed out) [02:55] *** MMovie has joined #archiveteam [03:00] *** JesseW has quit IRC (Quit: Leaving.) [03:05] *** MMovie has quit IRC (Read error: Operation timed out) [03:06] *** MMovie has joined #archiveteam [03:11] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [03:24] *** MMovie has quit IRC (Read error: Operation timed out) [03:25] *** MMovie has joined #archiveteam [03:42] *** MMovie has quit IRC (Read error: Operation timed out) [03:43] *** Boppen has joined #archiveteam [03:43] *** MMovie has joined #archiveteam [03:48] *** bwn has quit IRC (Ping timeout: 492 seconds) [03:58] *** ndiddy has joined #archiveteam [04:03] *** xXx_ndidd has joined #archiveteam [04:12] *** xXx_ndidd has quit IRC (Read error: Connection reset by peer) [04:13] *** xXx_ndidd has joined #archiveteam [04:16] *** Boppen has quit IRC (Ping timeout: 200 seconds) [04:16] *** ndiddy has quit IRC (Read error: Operation timed out) [04:21] *** JesseW has joined #archiveteam [04:26] A games database that is looking for a home -- http://forum.kodi.tv/showthread.php?tid=261575 someone should suggest archive.org for them. [04:34] *** MMovie has quit IRC (Read error: Operation timed out) [04:35] Nemo_bis: it looks like the last wikiteam dump of the archiveteam wiki was in october 2015 -- could you make another one? [04:36] (I'm asking you because you are listed as the uploader for https://archive.org/details/wiki-archiveteamorg ) [04:36] *** MMovie has joined #archiveteam [04:38] *** bsmith093 has joined #archiveteam [04:49] *** MMovie has quit IRC (Read error: Operation timed out) [04:50] *** MMovie has joined #archiveteam [05:07] *** MMovie has quit IRC (Read error: Operation timed out) [05:08] *** MMovie has joined #archiveteam [05:10] *** xXx_ndidd has quit IRC (Read error: Connection reset by peer) [05:21] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [05:24] *** MMovie has quit IRC (Read error: Operation timed out) [05:25] *** MMovie has joined #archiveteam [05:30] *** Sk1d has joined #archiveteam [05:41] *** MMovie has quit IRC (Read error: Operation timed out) [05:43] *** MMovie has joined #archiveteam [05:45] *** myself has joined #archiveteam [05:46] yo bitches [05:46] http://www.ridethemindway.com/phones/ [05:47] no idea where that came from or how long it'll be up, but something tells me, not forever [05:47] *** myself has quit IRC (Client Quit) [05:50] *** aksel has joined #archiveteam [05:50] Why did stypi shut down [05:50] ? [05:50] Hello? [05:50] Is anyone here? [05:50] Why did Code.Stypi Shutdown [05:51] . [05:51] *** aksel has quit IRC (Client Quit) [06:01] *** atank1 has joined #archiveteam [06:01] hello [06:03] I'd never heard of Code.Stypi before. [06:03] Code.stypi.com was a online source project that allowed programming languages on docs that you could edit with friends [06:04] apparently I forgot it, as I created a wiki page for it back in Aug 2015: http://archiveteam.org/index.php?title=Stypi&action=history [06:05] Wait you work with the the team?\ [06:05] It doesn't look like we made any specific effort to save it. [06:05] What happend to it? [06:06] Why did they shut it down? [06:06] Do you know why they shut Code.Stypi.com Down? [06:06] Apparently whoever was running it decided to stop. I don't see any farewell notice, although apparently there was something saying it was going to die on Sept 3, 2015. [06:07] oh [06:07] is there an Archive with the websites source? like the code i would love to bring it back as my students are quite depressed. [06:08] *** VADemon has quit IRC (Quit: left4dead) [06:08] I don't think so. It doesn't look like the source for it was available. [06:09] I decided to try and talk to somoene, I know it went down a while ago but. i want to try and get it back. [06:09] Shit. [06:09] Well uhh [06:09] *** MMovie has quit IRC (Read error: Operation timed out) [06:09] do you know who created it? [06:09] I think there are alternatives, though. [06:09] Such as? [06:09] Collaborative text editing, certainly -- and I think collaborative programming too. Not sure of names offhand, though. [06:10] *** MMovie has joined #archiveteam [06:10] I'm digging around in the Wayback Machine copy, to see if I can dig up any relevant contact info. [06:11] You can do the same. [06:11] ? [06:11] e.g. https://web.archive.org/web/20130514113228/https://www.stypi.com/press [06:11] ._. [06:12] I was asking around and someone said they would sell the source code for a couple thousand [06:12] It looks like they were owned by Salesforce. So that'd be who you should contact. [06:12] Bullshit lol [06:12] Please let us know if you have any luck. [06:12] Ok [06:12] but they were aquired back in 2012 [06:12] so it wasn't the aquasition that killed them [06:13] and this gives their address (back in 2012) https://web.archive.org/web/20150320123654/https://code.stypi.com/privacy [06:14] apparently someone i know attally knows where a source can be located [06:14] neat! [06:14] if you get a hold of it, please upload a copy to the Internet Archive [06:15] Their tweets (all 42 of them) are hidden: https://twitter.com/stypi [06:19] according to https://www.technologyreview.com/s/425690/google-wave-reincarnated/ the founders names were: Byron Milligan and Jason Chen. [06:20] ? [06:20] *** MMovie has quit IRC (Read error: Operation timed out) [06:20] You could try emailing/tweeting at them. [06:22] *** Stiletto has joined #archiveteam [06:22] *** MMovie has joined #archiveteam [06:26] ? [06:26] grr [06:28] ? [06:28] He lied to me [06:29] your contact who said they had a copy of the source code? damm, that sucks [06:29] Yep [06:30] I can't say I'm surprised, but I'm sorry to hear it. [06:34] Attually [06:34] Since all my students robotic programming is gone... [06:34] well [06:35] i guess i just have to break the news [06:35] they wanted me to grab their code for them [06:36] oh dear [06:36] i hope this does not get me fired [06:37] I hope not. That's an awful place to get stuck in. [06:37] Beware The Cloud. [06:37] *** dashcloud has quit IRC (Read error: Operation timed out) [06:37] Always Have Multiple Local Backups [06:38] *** dashcloud has joined #archiveteam [06:39] We did [06:40] but last week the Servers got attacked by Crpytowall [06:40] OH FUCK. That REALLY sucks. [06:40] ik [06:41] BTW, thank you *VERY MUCH* for telling your story here. It's shit like this that demonstrates why what we do (or failed to do in this case) is important. [06:42] anyways imma go [06:42] JesseW: not to distract from the suck of cryptowall, but did you ever get that tar --> zip thing going? [06:43] fuck im lagging my dads internet [06:43] i should get off [06:43] atank1: hope it goes well [06:43] My dad aint happy [06:43] It'll be a few days -- I want to keep my new IA census workstuff around until jjake gets the census stuff uploaded. [06:44] I borrow the internet from my dad next door lol [06:44] JesseW: k then, thanks [06:44] cya [06:44] bsmith093 but I should be able to start downloading the file at least [06:44] *** atank1 has quit IRC () [06:44] Well, that was a dammed sob story. :-( [06:44] yeah that sucks, who *lies* about having source code? [06:51] "Most importantly, Stypi will continue to be the Stypi you know. Our users will continue to have access to this great service, community, and innovation." [06:51] nice [06:51] Where is that from? [06:51] https://web.archive.org/web/20130514120823/http://blog.stypi.com/ [06:52] more specifically https://web.archive.org/web/20130325111746/http://blog.stypi.com/2012/05/stypi-joins-salesforce-com/ [06:52] well, it wasn't being bought that killed them -- they didn't die till 3 years later. [06:53] bsmith093: fanfiction download started. [06:53] true but Bram Mooleenar switching jobs to Google didn't kill vim 3 years later [06:54] JesseW: thanks. do, Stypi didn't actually say they were dumping anything. they usually make a point of that. [06:54] startups in general, i mean [06:54] and that illustrates the difference between a piece of FOSS standalone software and a proprietary service [06:55] I mean it's nothing that nobody in here doesn't know [06:55] bsmith093: they did say they were deleting it: "All documents that have not been downloaded to an archive by that time will be deleted. " [06:55] it really sucks for atank1 and unless someone here happens to have an archive they may just be out of luck [06:56] bsmith093: ETA on the fanfict download is 2days, 5 hours. :-) [06:56] JesseW: just curious, what isp do you have? [06:57] theres also a torrent file [06:58] bsmith093: Wave G in Seattle [06:59] Hm, I suppose I'll switch over to the torrent. [06:59] JesseW: never heard of them, any good? [06:59] They used to be called CondoInternet [07:00] I've been very happy with them. Only complaint is that they keep sending me junk mail offering me a discount to sign up -- after I've already signed up. :-) [07:00] how fast [07:01] also will someone with ops please add http://www.ridethemindway.com/phones/ to archivebot yipdw_ SketchCow ersi [07:01] bsmith093 already in there [07:02] check the dashboard [07:02] bsmith093: I pay for 100 Mbps [07:02] great, it looks like 80s era phone docs [07:02] those are rare [07:03] so do i, new isp, faster than twc, for much cheaper [07:03] It's already grabbed 5 GB, with about 1000 files to go [07:03] I'm just delighted to not have to deal with the big ISPs. Those folks are simply unpleasant to deal with. [07:05] mine's been down once since i got them, like ~2 years ago, for maybe a half hour, when i called them, they ACTUALLY TOLD ME WHY!! [07:05] that is excellent, yeah [07:07] JesseW: also grab the inventory file, that might not be int the torrent yet, i don't know how fast that updates [07:09] It's not in the torrent I'm using (with hash d934709d1c7f1bf26d826718804de5f7a53757dc) [07:10] it's on the page though. [07:10] I know. I'll grab it afterward -- I mean, I can regenerate it myself once I have the data. :-) [07:13] The torrent ETA is 1 day, 19 hours right now. [07:13] or 1 day, 14 hours. [07:27] *** vitzli has joined #archiveteam [07:32] *** MMovie has quit IRC (Read error: Operation timed out) [07:34] *** MMovie has joined #archiveteam [07:37] *** JesseW has quit IRC (Quit: Leaving.) [07:43] *** dashcloud has quit IRC (Read error: Operation timed out) [07:45] *** DFJustin has quit IRC (Remote host closed the connection) [07:46] *** dashcloud has joined #archiveteam [07:50] *** metalcamp has joined #archiveteam [08:00] *** DFJustin has joined #archiveteam [08:00] *** MMovie has quit IRC (Read error: Operation timed out) [08:01] *** MMovie has joined #archiveteam [08:07] *** brayden has quit IRC (Read error: Connection reset by peer) [08:07] *** brayden has joined #archiveteam [08:17] *** MMovie has quit IRC (Read error: Operation timed out) [08:18] *** MMovie has joined #archiveteam [08:36] *** MMovie has quit IRC (Read error: Operation timed out) [08:38] *** MMovie has joined #archiveteam [08:54] *** MMovie has quit IRC (Read error: Operation timed out) [08:56] *** MMovie has joined #archiveteam [09:07] *** Tomcat_ has joined #archiveteam [09:19] How can I add into wget -H -D domains, that looks like this - imagesX.fotosik.pl, where X is number from 1 to 99? [09:25] *** MMovie has quit IRC (Read error: Operation timed out) [09:27] *** MMovie has joined #archiveteam [09:29] *** Zei-Pii has joined #archiveteam [09:38] *** vitzli has quit IRC (Leaving) [09:41] *** MMovie has quit IRC (Read error: Operation timed out) [09:42] *** MMovie has joined #archiveteam [09:48] *** bwn has joined #archiveteam [09:58] *** MMovie has quit IRC (Read error: Operation timed out) [09:58] *** MMovie has joined #archiveteam [10:04] *** hictooth has quit IRC (Ping timeout: 255 seconds) [10:13] https://twitter.com/trulloapp/status/702225155464900608 [10:13] "Thank you all for your support & app love. Sadly, we're shutting down Trullo. We are grateful for your contributions and we'll miss you. :(" [10:13] well [10:13] looks like they didn't waste any time [10:13] DNS doesn't resolve anymore [10:14] wow [10:14] https://www.producthunt.com/tech/trullo [10:16] well [10:16] this has got to be one of the most dickish shutdowns [10:16] I think they just beat Yahoo [10:16] shutdown announcement with no notice [10:16] DNS gone 5 days later [10:22] *** hictooth has joined #archiveteam [10:23] *** MMovie has quit IRC (Read error: Operation timed out) [10:23] Is there some kind of “DNS archive” you could go back to, fetch the IP address and see if the server still works? [10:23] *** hictooth has quit IRC (Client Quit) [10:25] *** MMovie has joined #archiveteam [10:25] PurpleSym: potentially, sec [10:26] PurpleSym: https://www.robtex.com/?dns=trullo.com [10:26] so yeah [10:26] 52.24.188.223 and 52.24.194.8 [10:27] Also: https://dnshistory.org/dns-records/trullo.com [10:27] AWS [10:27] IPs non-responsive [10:27] PurpleSym: robtex is more complet e:P [10:28] Indeed. [10:28] I <3 robtex [10:28] the NSA does too, for obvious reasons [10:29] We should get a copy. [10:29] Anyway, was worth a shot… [10:30] PurpleSym: copy of? [10:32] Robtex. [10:32] heh. [10:32] robtex is big [10:32] I still need to eventually talk to the guy and see if some kind of feed can be negotiated [10:32] he's supposedly working on an API for 'qualified organizations' but it's not entirely clear to me what that would mean [10:33] but [10:33] -bs [10:45] *** MMovie has quit IRC (Read error: Operation timed out) [10:46] *** MMovie has joined #archiveteam [11:03] *** MMovie has quit IRC (Read error: Operation timed out) [11:04] *** MMovie has joined #archiveteam [11:16] *** winterfox has quit IRC (Remote host closed the connection) [11:21] *** MMovie has quit IRC (Read error: Operation timed out) [11:23] *** MMovie has joined #archiveteam [11:41] *** MMovie has quit IRC (Read error: Operation timed out) [11:43] *** MMovie has joined #archiveteam [11:57] *** schbirid has joined #archiveteam [11:59] *** MMovie has quit IRC (Read error: Operation timed out) [12:00] *** MMovie has joined #archiveteam [12:13] *** philpem has joined #archiveteam [12:16] *** MMovie has quit IRC (Read error: Operation timed out) [12:17] *** MMovie has joined #archiveteam [12:34] *** MMovie has quit IRC (Read error: Operation timed out) [12:35] *** MMovie has joined #archiveteam [12:53] *** MMovie has quit IRC (Read error: Operation timed out) [12:54] Is someone interested in scannig FTPs for the FTP project? [12:54] Instructions and FTPs are here http://archiveteam.org/index.php?title=FTP/List [12:54] *** MMovie has joined #archiveteam [12:54] This is only scanning the FTP and creating a list of items for the grab. This won't take a lot of diskspace. [12:56] It might take a lot of time, depending on the number of files and the speed of the FTP [13:03] *** MMovie has quit IRC (Read error: Operation timed out) [13:05] *** MMovie has joined #archiveteam [13:10] *** snape has quit IRC (Hey! Where'd my controlling terminal go?) [13:18] Shall I kick a scan off on ftp://ftp.cup.cam.ac.uk - seems to have a lot of stuff on books published by Cambridge University Press [13:20] nvm, its done [13:21] or it isnt [13:36] *** MMovie has quit IRC (Read error: Operation timed out) [13:37] *** philpem has quit IRC (Ping timeout: 260 seconds) [13:37] *** MMovie has joined #archiveteam [13:55] *** MMovie has quit IRC (Read error: Operation timed out) [13:56] *** MMovie has joined #archiveteam [14:13] *** MMovie has quit IRC (Read error: Operation timed out) [14:14] arkiver: Would you accept the output of `ncftpls -R`? [14:15] I'm not sure what kind of output that gives [14:15] *** MMovie has joined #archiveteam [14:15] URL and filesize? [14:15] if yes, then I can convert it, if no, then no [14:16] Though using the ftp-queue scripts would be best for this (scripts will be more optimized in the future) [14:16] Output looks like this: http://pastebin.com/t6vcPHbD [14:17] Looks like URL can be generated and size is also there [14:17] so I should be able to convert it [14:17] Why would you use that command rather then the ftp-queue script? [14:18] I don’t see why a script needed to be written for that in the first place :) [14:19] The script wil make sure only new files are size changed files are added to the itemlists [14:20] Previously scanned FTPs are also in /archive/, so they can be used for that if they are scanned again [14:20] It creates smaller lists of 200 MB of FTP files [14:20] `cat old new | sort | uniq` [14:21] Also tests for the server response if a file or folder does not exist [14:30] *** megaminxw has quit IRC (Quit: Leaving.) [14:30] *** MMovie has quit IRC (Read error: Operation timed out) [14:31] *** MMovie has joined #archiveteam [14:35] *** ohhdemgir has quit IRC (Read error: Operation timed out) [14:47] *** MMovie has quit IRC (Read error: Operation timed out) [14:48] *** MMovie has joined #archiveteam [14:48] *** zhongfu has quit IRC (Remote host closed the connection) [14:59] *** MMovie has quit IRC (Read error: Operation timed out) [15:00] *** test_ has joined #archiveteam [15:00] uhh hello? [15:00] hi [15:01] can i request something for deletion [15:01] personal info [15:01] Requests for deletion of something should be sent to info@archive.org [15:01] ok thanks [15:01] *** MMovie has joined #archiveteam [15:01] will do, bye [15:01] *** test_ has quit IRC (Client Quit) [15:11] *** scyther has joined #archiveteam [15:31] *** MMovie has quit IRC (Read error: Operation timed out) [15:33] *** MMovie has joined #archiveteam [15:50] *** MMovie has quit IRC (Read error: Operation timed out) [15:52] *** MMovie has joined #archiveteam [16:00] *** snape has joined #archiveteam [16:06] *** MMovie has quit IRC (Read error: Operation timed out) [16:07] *** MMovie has joined #archiveteam [16:25] *** MMovie has quit IRC (Read error: Operation timed out) [16:26] *** MMovie has joined #archiveteam [16:30] *** ats has quit IRC (Quit: Let's see if Linux 4.4.3 has working NFS again...) [16:36] *** ats has joined #archiveteam [16:40] *** Zei-Pii has quit IRC (Read error: Connection reset by peer) [16:41] *** MMovie has quit IRC (Read error: Operation timed out) [16:42] *** MMovie has joined #archiveteam [16:52] arkiver: whatever happened to the dump of open FTPs that I had a while ago? [16:52] :p [16:54] *** philpem has joined #archiveteam [17:10] *** MMovie has quit IRC (Read error: Operation timed out) [17:12] *** MMovie has joined #archiveteam [17:30] *** MMovie has quit IRC (Read error: Operation timed out) [17:30] *** scyther_ has joined #archiveteam [17:31] *** MMovie has joined #archiveteam [17:32] *** scyther has quit IRC (Ping timeout: 250 seconds) [17:45] joepie91: I'll look into that! [17:47] *** MMovie has quit IRC (Read error: Operation timed out) [17:49] joepie91: do you still have that dump? [17:50] *** MMovie has joined #archiveteam [17:52] arkiver: eh, might have, but my files are a mess atm [17:59] *** zhongfu has joined #archiveteam [18:00] arkiver: remind me of the filename? [18:06] *** zhongfu has quit IRC (Remote host closed the connection) [18:11] *** JesseW has joined #archiveteam [18:15] *** metalcamp has quit IRC (Ping timeout: 252 seconds) [18:16] *** zhongfu has joined #archiveteam [18:17] *** MMovie has quit IRC (Read error: Operation timed out) [18:18] *** MMovie has joined #archiveteam [18:36] *** MMovie has quit IRC (Read error: Operation timed out) [18:37] *** MMovie has joined #archiveteam [18:41] *** scyther_ has quit IRC (Read error: Connection reset by peer) [18:54] *** MMovie has quit IRC (Read error: Operation timed out) [18:56] *** MMovie has joined #archiveteam [19:04] *** godane has quit IRC (Read error: Operation timed out) [19:07] *** victor has joined #archiveteam [19:11] *** MMovie has quit IRC (Read error: Operation timed out) [19:12] *** MMovie has joined #archiveteam [19:29] *** MMovie has quit IRC (Read error: Operation timed out) [19:31] *** MMovie has joined #archiveteam [19:33] *** Infreq has quit IRC (Ping timeout: 258 seconds) [19:34] *** Infreq has joined #archiveteam [19:35] *** zino_ has joined #archiveteam [19:37] *** Burak has quit IRC (Ping timeout: 255 seconds) [19:38] *** schbirid has quit IRC (hub.efnet.us irc.Prison.NET) [19:38] *** zino has quit IRC (hub.efnet.us irc.Prison.NET) [19:38] *** vOYtEC has quit IRC (hub.efnet.us irc.Prison.NET) [19:38] *** achip has quit IRC (hub.efnet.us irc.Prison.NET) [19:42] *** schbirid2 has joined #archiveteam [19:57] *** MMovie has quit IRC (Read error: Operation timed out) [19:57] *** achip has joined #archiveteam [19:58] *** MMovie has joined #archiveteam [19:58] *** vOYtEC has joined #archiveteam [20:02] *** Burak has joined #archiveteam [20:11] *** MMovie has quit IRC (Read error: Operation timed out) [20:12] *** ndiddy has joined #archiveteam [20:12] *** megaminxw has joined #archiveteam [20:13] *** MMovie has joined #archiveteam [20:29] *** metalcamp has joined #archiveteam [20:47] *** Burak has quit IRC (Ping timeout: 255 seconds) [20:48] *** JesseW has quit IRC (Quit: Leaving.) [20:48] *** MMovie has quit IRC (Read error: Operation timed out) [20:49] *** MMovie has joined #archiveteam [20:50] *** scyther has joined #archiveteam [21:04] *** MMovie has quit IRC (Read error: Operation timed out) [21:05] *** MMovie has joined #archiveteam [21:06] *** Burak has joined #archiveteam [21:22] *** MMovie has quit IRC (Read error: Operation timed out) [21:23] *** MMovie has joined #archiveteam [21:26] *** metalcamp has quit IRC (Ping timeout: 252 seconds) [21:32] *** bwn has quit IRC (Ping timeout: 246 seconds) [21:39] *** MMovie has quit IRC (Read error: Operation timed out) [21:40] *** MMovie has joined #archiveteam [21:43] *** Tomcat_ has quit IRC (Remote host closed the connection) [21:51] *** Boppen has joined #archiveteam [21:54] *** mismatch_ has joined #archiveteam [22:03] *** schbirid2 has quit IRC (Quit: Leaving) [22:07] *** Boppen has quit IRC (hub.se irc.du.se) [22:11] *** bwn has joined #archiveteam [22:25] *** Boppen has joined #archiveteam [22:27] *** MMovie has quit IRC (Read error: Operation timed out) [22:29] *** MMovie has joined #archiveteam [22:30] *** JesseW has joined #archiveteam [22:30] *** scyther has quit IRC (Read error: Connection reset by peer) [22:30] *** bwn has quit IRC (Read error: Operation timed out) [22:34] *** scyther has joined #archiveteam [22:44] *** Boppen has quit IRC (Ping timeout: 200 seconds) [22:47] *** Boppen has joined #archiveteam [22:53] *** Boppen has quit IRC (hub.se irc.du.se) [22:56] *** megaminxw has quit IRC (Quit: Leaving.) [22:59] *** mismatch_ has quit IRC (Ping timeout: 499 seconds) [23:06] *** rduser has quit IRC (Ping timeout: 260 seconds) [23:06] *** Rickster has quit IRC (Ping timeout: 260 seconds) [23:08] *** Famicoman has quit IRC (Ping timeout: 260 seconds) [23:09] *** Simpbrai_ has quit IRC (Remote host closed the connection) [23:10] *** bauruine has quit IRC (Ping timeout: 260 seconds) [23:10] *** mismatch_ has joined #archiveteam [23:10] *** MMovie has quit IRC (Read error: Operation timed out) [23:11] *** rduser has joined #archiveteam [23:12] *** MMovie has joined #archiveteam [23:12] *** bauruine has joined #archiveteam [23:13] *** Rickster has joined #archiveteam [23:14] *** Simpbrai_ has joined #archiveteam [23:27] *** MMovie has quit IRC (Read error: Operation timed out) [23:28] *** MMovie has joined #archiveteam [23:43] BnA-Robin: if you're interested in another project to run you might like FTP [23:43] restarted today and we don't have a lot of people running it yet [23:46] *** Boppen has joined #archiveteam [23:48] SketchCow: What do you think of saving LiveJournal? We can make it a long running project, maybe over a year, so it won't need a lot of resources [23:49] If you give the go we'll have a project running soon for livejournal [23:54] *** bwn has joined #archiveteam [23:54] *** scyther has quit IRC (Quit: Leaving)