[00:00] the error was "500 internal server error" [00:00] I get the feeling that it has to do with the item I'm trying to upload (or one specific file in it), however I uploaded many similar items just fine the last hour or so... [00:01] is there a blacklist on filenames or any scans that are being done while I upload (as opposed to post-processing) that I'm triggering? [00:03] Darkstar: just out of curiosity, when you look at http://catalogd.archive.org/catalog.php?justme=1 do you have a ton of outstanding jobs? [00:04] it looks like things have been sluggish with lots of failed disks this week [00:04] yeah, 2 jobs in queue (one waiting, one running), but they were from ~2h ago and I uploaded a few other items after that just fine... [00:05] could be that there are globally too many jobs too [00:05] somewhere in the S3 documentation is a place you can check your quota [00:05] i've noticed derives taking a really long time lately [00:05] also in the archivebot uploader there is code to do it [00:05] I'm wondering because it seems to trigger only with that (slightly bigger) item I'm uploading (and by "bigger" I mean ~220 mb, nothing too fancy) [00:05] derives have been taking a long time all month [00:06] yep [00:06] the item with the 2 jobs in my queue is ~23mb so I'm wondering why it takes so long [00:06] *** kristian_ has joined #archiveteam [00:06] computers are tired [00:06] Darkstar: probably not - I upload 2-3GB files probably 50 times a day at least and it keeps chugging away. [00:06] I uploaded a ~200mb one after that and that one seems to have cleared the queue already [00:07] that's why I was wondering if it's maybe something with the item. but it's just a ZIP file, a few JPGs and a few disk image files (of which I uploaded a few already today without problems) [00:09] I'll try uploading a different (smaller) item ... [00:11] *** nertzy has joined #archiveteam [00:14] okay, that worked... [00:15] strange. I don't get it... [00:19] ok, let's try again [00:20] I changed the order of the files in the uploader now, maybe I can find out if it stops at a specific file or something [00:21] if it doesn't work I'll try again tomorrow by uploading only one file and adding the others one by one later... [00:24] strange. now it worked [00:25] oh well. probably cosmic rays or something :) [00:25] I'll keep an eye on my job queue though, to see if the stuck tasks are getting through by tomorrow. otherwise I'll probably re-derive that one item [00:25] *** i336_ has joined #archiveteam [00:28] oh, if you aren't setting the content-length and content-md5 headers, do so [00:31] huh? I'm using the web-based uploader, I would think that it sets these headers correctly by default? [00:31] aah, likely [00:31] I guess the web uploader uses the S3 API now then. cool. [00:33] *** Kironide has joined #archiveteam [00:35] yeah, looks like it (from the error message). I have yet to find a nice (and usable) tool to upload items from the command line [00:35] are there any good tools available for backing up your Facebook activity? [00:35] - the official tool doesn't download everything (e.g. your comments in other places) [00:36] - digi.me / socialsafe is just horrible to use [00:36] they all require me to either specify the metadata in some archaic CSV file (where it's not at all clear how to use cr/lf), or plain just don't work [00:39] Darkstar: consider basing one off of my IA S3 uploader in github.com/falconkirtaran/ArchiveBot [00:57] *** nertzy has quit IRC (Read error: Operation timed out) [01:16] *** ravetcofx has joined #archiveteam [01:43] *** pizzaiolo has quit IRC (Remote host closed the connection) [01:52] *** Swizzle has joined #archiveteam [01:54] *** dashcloud has quit IRC (Read error: Operation timed out) [01:58] *** dashcloud has joined #archiveteam [01:59] *** odemg has quit IRC (Remote host closed the connection) [01:59] *** odemg has joined #archiveteam [02:03] *** kristian_ has quit IRC (Quit: Leaving) [02:06] Kironide: the only thing I find bad about it is its scheduler sometimes goes off whack but I haven't seen anything other than standard api limits that turn it into shite [02:09] wp494, I have lots of trouble exporting my data elsewhere, I have to do it in very small chunks or the program locks up [02:09] and the export formats (CSV and PDF) are both very user-unfriendly [02:10] presumably I could look into the database files directly, except I read online that digi.me's database files are encrypted with something that isn't your password [02:12] *** dashcloud has quit IRC (Read error: Operation timed out) [02:13] do you know of any better solutions? I've failed to find any [02:15] not that I've seen [02:17] *** dashcloud has joined #archiveteam [02:19] also it seems that it doesn't download comments you've made on other people's posts [02:23] *** Hobart has joined #archiveteam [02:28] *** VADemon has quit IRC (Quit: left4dead) [02:31] *** BlueMaxim has joined #archiveteam [02:36] *** dashcloud has quit IRC (Read error: Operation timed out) [02:41] *** dashcloud has joined #archiveteam [03:09] *** Swizzle has quit IRC (Quit: Leaving) [03:11] *** RetroRomp has joined #archiveteam [03:12] Uh... How do we notify you guys of whole sites and forums that are in danger? [03:12] Anyone? [03:13] In here [03:13] feel free to tell us [03:13] ringplus.net is an MVNO that is soon to be shut down. [03:14] Currently they are migrating all of their users to another service in preparation for it, but there are several tens of thousands of messages plus their cell plans were unique. [03:15] Ah, I just checked and we're on that already with ArchiveBot. [03:15] You can have a look at the dashboard http://dashboard.at.ninjawedding.org/3 [03:15] Great! Even social.ringplus.net? [03:15] that specifically. [03:16] Why did I even worry? Do you guys need / want the user dashboard that is only available behind an account? [03:20] It's probably a bit out of the scope, the ArchiveBot job is just grabbing all the available public data. [03:21] Is there much significance to it? [03:48] *** RetroRomp has quit IRC (Read error: Connection reset by peer) [03:50] *** Hobart has left [04:01] *** Burak has joined #archiveteam [04:01] *** Svekla has quit IRC (Read error: Connection reset by peer) [04:08] *** ravetcofx has quit IRC (Read error: Operation timed out) [04:15] *** SmileyG has quit IRC (Read error: Connection reset by peer) [04:25] *** ravetcofx has joined #archiveteam [04:29] *** Smiley has joined #archiveteam [04:59] whenever people say MVNO I think of DVNO and everything becomes printed in gold [04:59] details make the girls sweat even more [05:00] ^^^^^ [05:00] *** ndiddy has quit IRC (Read error: Connection reset by peer) [05:16] *** i336_ has quit IRC (Ping timeout: 260 seconds) [05:18] *** maelstrom has quit IRC (Quit: Leaving) [05:52] wat [05:52] i love my MVNO [06:15] *** ravetcofx has quit IRC (Read error: Operation timed out) [06:16] *** QBcrusher has joined #archiveteam [06:17] *** ravetcofx has joined #archiveteam [06:29] *** unkn0wn_ has joined #archiveteam [07:19] *** crwbot_ has quit IRC (Quit: leaving) [07:34] *** anomie has quit IRC (Ping timeout: 250 seconds) [07:42] *** achip has quit IRC (west.us.hub irc.Prison.NET) [07:42] *** LastNinja has quit IRC (west.us.hub irc.Prison.NET) [07:42] *** tpw_rules has quit IRC (west.us.hub irc.Prison.NET) [07:42] *** K4k has quit IRC (west.us.hub irc.Prison.NET) [07:42] *** sivoais has quit IRC (west.us.hub irc.Prison.NET) [07:42] *** mundus201 has quit IRC (west.us.hub irc.Prison.NET) [07:42] *** ColdIce has quit IRC (west.us.hub irc.Prison.NET) [07:42] *** midas1 has quit IRC (west.us.hub irc.Prison.NET) [07:42] *** achip has joined #archiveteam [07:42] *** LastNinja has joined #archiveteam [07:42] *** tpw_rules has joined #archiveteam [07:42] *** K4k has joined #archiveteam [07:42] *** sivoais has joined #archiveteam [07:42] *** mundus201 has joined #archiveteam [07:42] *** ColdIce has joined #archiveteam [07:42] *** midas1 has joined #archiveteam [07:42] *** irc.Prison.NET sets mode: +o midas1 [07:49] *** spiko has joined #archiveteam [07:53] *** anomie has joined #archiveteam [07:58] *** FalconK has quit IRC (Ping timeout: 260 seconds) [08:08] *** odemg has quit IRC (Remote host closed the connection) [08:43] *** pikhq_ has quit IRC (Ping timeout: 245 seconds) [08:50] *** pikhq has joined #archiveteam [08:58] *** schbirid has joined #archiveteam [09:31] *** i336_ has joined #archiveteam [09:56] *** FalconK has joined #archiveteam [10:50] *** ravetcofx has quit IRC (Read error: Operation timed out) [11:12] *** BlueMaxim has quit IRC (Quit: Leaving) [11:15] *** Morbus has joined #archiveteam [11:25] *** BiggieJon has joined #archiveteam [11:28] *** Morbus has quit IRC (Quit: http://www.disobey.com/) [11:59] *** Morbus has joined #archiveteam [12:22] How would i go about uploading about 3TB of data? I tried messaging sketchcow, but he didn't respond [12:41] *** pizzaiolo has joined #archiveteam [13:07] *** i336_ has quit IRC (Ping timeout: 260 seconds) [13:40] *** nertzy has joined #archiveteam [13:55] *** nertzy has quit IRC (Ping timeout: 255 seconds) [14:09] *** closure has quit IRC (Ping timeout: 244 seconds) [14:15] *** nertzy has joined #archiveteam [14:18] *** closure has joined #archiveteam [14:44] *** will has quit IRC (Ping timeout: 244 seconds) [14:45] *** will has joined #archiveteam [14:50] *** kris33 has quit IRC (Textual IRC Client: www.textualapp.com) [14:51] *** HCross has quit IRC (Read error: Connection reset by peer) [14:52] *** kris33 has joined #archiveteam [14:52] *** HarryCros has joined #archiveteam [14:55] *** paparus has joined #archiveteam [14:55] *** Boppen has quit IRC (Ping timeout: 194 seconds) [14:56] *** Boppen has joined #archiveteam [15:02] *** kris33 has quit IRC (Textual IRC Client: www.textualapp.com) [15:03] *** kris33 has joined #archiveteam [15:04] *** nertzy has quit IRC (Ping timeout: 255 seconds) [15:05] *** kris33 has quit IRC (Textual IRC Client: www.textualapp.com) [15:07] *** nertzy has joined #archiveteam [15:07] *** kris33 has joined #archiveteam [15:30] *** ravetcofx has joined #archiveteam [15:49] *** kris33 has quit IRC (Textual IRC Client: www.textualapp.com) [15:51] *** kris33 has joined #archiveteam [15:59] *** kris33 has quit IRC (Textual IRC Client: www.textualapp.com) [16:31] scyther, use the IA CLi tool, and if possible cut it up into 400gb chunks [16:32] okay, so no special permission needed for stuff this big? [16:34] Nope. Do try to break it into smaller chunks though so the derive task doesn't murder you in your sleep. [16:34] What are you uploading? [16:49] backup of nintendo game servers [16:50] encrypted binaries of games, but when they pull them (they already pulled some), people who have the title keys can download from archive [17:02] nice [17:02] are these individual games? [17:03] or like packs of games? [17:03] if you have any metadata and depending on the number of games it might be nice to upload them as multiple items [17:23] *** odemg has joined #archiveteam [17:23] *** odemg has quit IRC (Connection closed) [17:24] *** odemg has joined #archiveteam [17:29] *** jonty has joined #archiveteam [17:29] Hello! [17:30] I'd like to ensure that all of gov.uk is archived - a random sampling of URL's from the sitemap shows it has very little [17:30] What's the best way to go about doing this? [17:30] It's about 300k urls [17:30] I was about to just hit https://web.archive.org/save/ for all of them, but then realised there must be a better way using the appliance or something [17:31] Could someone point me at the right thing to be doing? [17:39] jonty, #cheetoflee and http://archiveteam.org/index.php?title=Government_Backup [17:45] Thanks! [17:58] *** btfo has quit IRC (Read error: Operation timed out) [17:59] *** unkn0wn_ has quit IRC () [18:04] *** Morbus has quit IRC (http://www.disobey.com/) [18:10] *** odemg has quit IRC (Remote host closed the connection) [18:53] *** unkn0wn_ has joined #archiveteam [18:54] *** odemg has joined #archiveteam [18:59] *** maz1324 has joined #archiveteam [19:01] *** maz1324 has quit IRC (Remote host closed the connection) [19:04] *** maz1324 has joined #archiveteam [19:07] *** odemg has quit IRC (Remote host closed the connection) [19:10] *** ZexaronS has joined #archiveteam [19:11] scyther/rocode: iirc a much smaller size is nicer, eg 50G [19:11] *** odemg has joined #archiveteam [19:11] that enables IA to distribute items across storage much nicer [19:19] yeah, it also helps when people want to download it [19:19] it happens sometimes [19:24] *** maz1324 has quit IRC (Quit: http://chat.efnet.org ) [19:55] *** mls has quit IRC (Ping timeout: 250 seconds) [20:05] *** maelstrom has joined #archiveteam [20:18] *** Ravenloft has joined #archiveteam [20:19] *** odemg has quit IRC (Remote host closed the connection) [20:24] *** mls has joined #archiveteam [20:33] *** lordcosmo has joined #archiveteam [20:39] *** lordcosmo has quit IRC (Read error: Connection reset by peer) [20:40] *** lordcosmo has joined #archiveteam [20:49] *** lordcosmo has quit IRC (Ping timeout: 250 seconds) [20:50] *** lordcosmo has joined #archiveteam [20:55] *** lordcosmo has quit IRC (Ping timeout: 250 seconds) [20:56] *** lordcosmo has joined #archiveteam [21:03] *** tsr has quit IRC (Quit: foo) [21:05] *** lordcosmo has quit IRC (Ping timeout: 250 seconds) [21:07] *** lordcosmo has joined #archiveteam [21:07] *** tsr has joined #archiveteam [21:12] cool article, if you missed it: http://venturebeat.com/2017/02/14/the-internet-archive-wants-to-host-pacer-records-from-u-s-courts-and-make-them-available-for-free/ [21:15] *** cheez has joined #archiveteam [21:16] just heard about this effort, wanted to say thanks for making it [21:16] *** schbirid has quit IRC (Quit: Leaving) [21:18] *** VADemon has joined #archiveteam [21:26] SketchCow: how goes the noaa archiving thing? iirc someone on /r/datahoarders has the whole thing (400GB or something) [21:29] *** n00b616 has joined #archiveteam [21:34] *** n00b616 has quit IRC (Quit: Page closed) [21:35] *** BlueMaxim has joined #archiveteam [21:37] cf. recap [21:38] we'll get them, will they or won't they. [21:38] *** lordcosmo has quit IRC (Ping timeout: 250 seconds) [21:45] *** icedice has joined #archiveteam [22:10] *** Honno has quit IRC (Read error: Connection reset by peer) [22:13] *** ndiddy has joined #archiveteam [22:21] *** crwbot has joined #archiveteam [22:33] SketchCow: when you've got a few minutes - /pipeline is broken again [22:35] *** Oddy has joined #archiveteam [22:41] goodevening [22:41] *** btfo has joined #archiveteam [22:54] *** icedice has quit IRC (Read error: error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record mac) [22:55] *** icedice has joined #archiveteam [23:01] https://www.reddit.com/r/HumansBeingBros/comments/5u1aj6/on_saturday_morning_200_hackers_at_uc_berkeley/ [23:02] ^- actually related. [23:42] *** QBcrusher has quit IRC (Ping timeout: 492 seconds) [23:44] *** odemg has joined #archiveteam [23:48] *** odemg has quit IRC (Remote host closed the connection) [23:49] *** odemg has joined #archiveteam [23:57] *** Famicoman has joined #archiveteam [23:58] *** ats has quit IRC (Read error: Operation timed out)