[00:10] *** Boppen has quit IRC (Read error: Connection reset by peer) [00:11] *** Boppen has joined #archiveteam-bs [00:23] *** icedice2 has joined #archiveteam-bs [00:26] *** icedice has quit IRC (Ping timeout: 260 seconds) [00:33] *** Boppen has quit IRC (Read error: Connection reset by peer) [00:34] *** Boppen has joined #archiveteam-bs [00:46] *** Boppen has quit IRC (Read error: Connection reset by peer) [00:48] *** Boppen has joined #archiveteam-bs [00:51] *** VADemon has joined #archiveteam-bs [01:08] *** icedice2 has quit IRC (Read error: Connection reset by peer) [01:19] *** BlueMaxim has joined #archiveteam-bs [01:31] *** ZexaronS has quit IRC (Leaving) [01:32] *** schbirid2 has joined #archiveteam-bs [01:33] *** j08nY has quit IRC (Quit: Leaving) [01:34] *** JRWR-Work is now known as JRWR [01:35] *** schbirid has quit IRC (Ping timeout: 268 seconds) [01:38] *** schbirid2 has quit IRC (Ping timeout: 250 seconds) [01:41] *** schbirid has joined #archiveteam-bs [02:28] Is it strange that I find it fun to run a Rsync target [02:28] JRWR: you might be a datahoarder [02:28] :P [02:28] LOOK AT THESE GRAPHS http://jrwr.io:19999 [02:29] * JRWR starts to drool [02:29] lol [02:31] I love graphs, I look at my munin ones all the time for no reason [02:31] :p [02:31] netdata has awesome live graphs [02:33] *** dashcloud has quit IRC (Read error: Operation timed out) [02:35] JRWR: you only have 1 GB of RAM? [02:35] On that box its 16GB [02:35] oh, I didn't even look at "cached" [02:36] Its handling the Pixiv project right now [02:37] is pixiv going down soon or something? [02:37] I haven't been paying attention [02:37] lol I turned on my nas after it being off for a few weeks, the rootfs is hosed [02:37] john@oblivion:~$ htop [02:37] -bash: /usr/bin/htop: Input/output error [02:38] are SSDs really that bad without power [02:38] Odd0002: chat.pixiv is [02:38] ok [02:48] Are there any media outlets that have said who the attackers in London are, or anything about them? [02:49] Frogging: If they're worn-out, maybe? A healthy SSD should definitely be able to go a few weeks without power [02:50] yeah I don't know yet if the SSD is broken or just the filesystem [02:50] I don't trust it anymore either way though [02:51] Yeah. At the very least I'd do a secure erase on it to reset its internal state [02:52] MrRadar: Man we are blazing now [02:52] we are up 3x the speed now [02:53] Yeah, FOS has lots of storage but it taps out fairly quickly on IOPS [02:54] All I got is 8TB [02:55] *** kyounko has quit IRC (Read error: Operation timed out) [02:56] I was thinking about that [02:56] cross uploading to FOS to make sure in case of my server failing [02:56] there is a backup [02:56] but I would do it in big chunks [02:57] wait what, SSDs die if they're not on? [02:58] They *can* but any decent drive should be able to go for a year or more unpowered [02:58] Yes [02:58] Bit Fade [02:58] Their flash cells leak electrons [02:58] Even USB Drives do [02:59] And SD/CF/whatever flash cards [02:59] Spinning Rust will Decay as well [02:59] but it takes MUCH longer [03:00] how long does it take for SSD's? I know I have a 20 GB HDD that came with win98 that still works [03:00] when was the last time it was powered on [03:01] it's still on [03:01] oh, the HDD? [03:02] a few weeks ago [03:02] I found an article on SSD unpowered data retention: http://www.anandtech.com/show/9248/the-truth-about-ssd-data-retention [03:03] Includes this graph, showing how many weeks a drive is expected to retain data based on the temperature at which the data was written and the temperature the drive is stored: http://images.anandtech.com/doci/9248/3_575px.PNG [03:04] Gotta keep them powered onj [03:04] Mostly every type of storage will decay [03:05] huh, guess I need to heat up my SSD while it's on [03:05] lol [03:05] lol, my rsync MOTD shows up on the warriors [03:06] maybe attach a peltier device to my SSD that heats one side when the PC is on and cools it when its off [03:08] if that data is correct then it might lengthen the data's lifetime [03:09] if I heat it to 55C and cool it to 25 when its off then I could get 8x the data lifetime! [03:10] Reading the article, that chart is for a drive at the end of its lifespan. Newer drives should retain data much longer [03:12] I know, but now I know what to do with my SSD after it's near the end of its lifespan [03:12] heat it when I power it on then put it in the freezer afterwards [03:22] Its like playing a incermental game, watching the numbers go up and up [03:38] * JRWR buys some Archive Team Stickets [03:38] Speaking of rsync, can we have seesaw detect when rsync failed because the files don't exist and just fail that job? Right now I'm going to have to cancel 5 jobs because the 6th is stuck in an infinite upload failure loop [03:39] lol [03:39] What project? is it savepixiv? [03:39] It's for SPUF [03:39] But I had to do the same for some a pixiv pipeline earlier today [03:47] MrRadar: it should never try to upload nonexistent files in the first place? [03:47] Ya [03:47] That should be the pipline's fault [03:48] I agree with that part too, but since it seems like it happens every so often it should probably still be handled [03:48] Ya, overall if a job keeps failing outright [03:48] the job manager should just nuke it and send it back [03:50] MrRadar: have you filed a bug about this occurring? [03:51] No, but that's a good idea [03:51] rsync trying to upload nonexistent files sounds indicative of a bigger, more serious issue :) [03:51] it's an invalid state that should never occur [03:52] Looks like there's already a bug for the missing data issue: https://github.com/ArchiveTeam/seesaw-kit/issues/48 [03:52] MrRadar: hold on, that's failing on a nonexistent *directory* [03:52] not nonexistent *files* [03:53] Yeah, sorry for not being precise, that's the exact issue I'm seeing [03:53] LOL, I even commented on the issue at the very end [03:54] Almost exactly 1 year ago [03:55] I would love to become a secondary rsync ingress for AT [03:55] have it auto sync the uploaded data to FOS and clear out the old when I have confirmed data upload [03:56] MrRadar: hm :/ [03:56] do it in bulk transfers and such since FOS iops are kind of low [03:57] afk 1 hour [03:57] *** JRWR has quit IRC (Quit: Page closed) [03:59] *** tfgbd_znc has quit IRC (Ping timeout: 600 seconds) [04:17] *** tfgbd_znc has joined #archiveteam-bs [04:56] *** SN4T14 has quit IRC (Quit: ZNC 1.6.3 - http://znc.in) [06:41] *** SN4T14 has joined #archiveteam-bs [06:55] *** Aranje has quit IRC (Ping timeout: 506 seconds) [07:54] *** SHODAN_UI has joined #archiveteam-bs [08:04] joepie91, arkiver: Just skimmed through the logs, but that “garbage” looks alot like HTTP chunked transfer encoding to me. [08:05] *** Mayonaise has quit IRC (Read error: Operation timed out) [08:17] *** Mayonaise has joined #archiveteam-bs [08:18] *** SHODAN_UI has quit IRC (Remote host closed the connection) [09:59] *** Whopper_ has quit IRC (Ping timeout: 250 seconds) [10:05] *** Whopper has joined #archiveteam-bs [10:14] *** Whopper has quit IRC (Read error: Operation timed out) [10:18] *** Whopper has joined #archiveteam-bs [10:26] *** RichardG has quit IRC (Read error: Connection reset by peer) [10:26] *** RichardG has joined #archiveteam-bs [10:27] *** SHODAN_UI has joined #archiveteam-bs [10:43] *** godane has quit IRC (Ping timeout: 506 seconds) [10:50] *** BlueMaxim has quit IRC (Read error: Operation timed out) [10:51] *** BlueMaxim has joined #archiveteam-bs [11:03] *** j08nY has joined #archiveteam-bs [11:17] *** antomati_ has joined #archiveteam-bs [11:17] *** swebb sets mode: +o antomati_ [11:18] *** antomatic has quit IRC (Read error: Operation timed out) [11:21] *** antomati_ is now known as antomatic [11:26] *** fie has quit IRC (Ping timeout: 246 seconds) [11:29] rsync error: errors selecting input/output files, dirs (code 3) at flist.c(2118) [sender=3.1.1] [11:29] Process RsyncUpload returned exit code 3 for Item threads:2755750-2755759 [11:29] ^ so this what you were discussing 12 hours ago just happened to me too [11:40] *** SHODAN_UI has quit IRC (Remote host closed the connection) [11:40] *** fie has joined #archiveteam-bs [11:47] *** ZexaronS has joined #archiveteam-bs [12:05] *** GLaDOS has quit IRC (Read error: Operation timed out) [13:02] *** BlueMaxim has quit IRC (Read error: Operation timed out) [13:06] *** bmcginty has quit IRC (Ping timeout: 268 seconds) [13:08] *** bmcginty has joined #archiveteam-bs [13:35] *** tfgbd_znc has quit IRC (Ping timeout: 600 seconds) [13:51] *** tfgbd_znc has joined #archiveteam-bs [13:53] *** icedice has joined #archiveteam-bs [14:11] *** fio67 has joined #archiveteam-bs [14:11] PurpleSym: There are allegedly some instances where the garbage is space-delimited instead of newline-delimited. [14:12] guys, is 6 warriors going to get me IP banned from anything besides yahoo and that other project? [14:12] Got an example URL, timmc? [14:14] I don't, just reporting what I saw in chat. [14:15] Hello, I know you guys aren't archive.org, but do you know if there's a channel for it? Tried ##archive on freenode but it's invite-only. [14:18] PurpleSym: There was this *very* interesting report joepie91 generated for (I think) just one URL, and by gosh it does look like garbled chunked transfer-encoding: http://sprunge.us/RjWi [14:19] Would be interesting to look at the actual WARC. [14:19] There are overlapping context chunks that have the right byte counts. (And look, there's a zero at the end.) [14:20] fio67: There's #internetarchive here on EFNet but I'm not sure if it's "official" [14:20] MrRadar: thanks, I'll take a look [14:22] *** fie has quit IRC (Ping timeout: 600 seconds) [14:26] *** Kalroth has joined #archiveteam-bs [14:28] PurpleSym: I think https://archive.org/download/archiveteam_portalgraphics_20160727140857/portalgraphics_20160727140857.megawarc.warc.gz and https://web.archive.org/web/20160724001629/http://www.portalgraphics.net/pg/illust/?image_id=10575 [14:31] That URL is not listed in the CDX as far as I see. [14:33] *** fie has joined #archiveteam-bs [14:34] *** tfgbd_znc has quit IRC (Ping timeout: 600 seconds) [14:35] PurpleSym: timmc: transfer chunk encoding sizes are decimal though, not hexadecimal? [14:35] that having been said it does seem to generally match up [14:35] in terms of size [14:35] Nope, hex: https://tools.ietf.org/html/rfc2616#section-3.6.1 [14:35] huh. [14:37] joepie91: Those hex chunks that are space-delimited... is that only in the version displayed on web.archive.org, or also in the WARC? [14:38] no idea, haven't checked the source. I think it's easier for arkiver to check that [14:47] *** Kalroth has quit IRC (Quit: Bye!) [14:52] *** tfgbd_znc has joined #archiveteam-bs [14:55] *** JRWR has joined #archiveteam-bs [15:00] *** fio67 has quit IRC (Quit: Page closed) [15:07] *** icedice has quit IRC (Ping timeout: 506 seconds) [15:15] *** Kalroth has joined #archiveteam-bs [15:26] *** pikhq has quit IRC (Ping timeout: 245 seconds) [15:29] *** tfgbd_znc has quit IRC (Ping timeout: 600 seconds) [15:31] *** pikhq has joined #archiveteam-bs [15:37] *** Aranje has joined #archiveteam-bs [15:45] *** tfgbd_znc has joined #archiveteam-bs [15:45] *** SHODAN_UI has joined #archiveteam-bs [16:05] *** Honno has joined #archiveteam-bs [16:05] *** tfgbd_znc has quit IRC (Ping timeout: 600 seconds) [16:11] *** w0rp has quit IRC (Ping timeout: 245 seconds) [16:11] *** tfgbd_znc has joined #archiveteam-bs [16:12] *** w0rp has joined #archiveteam-bs [16:49] *** dashcloud has joined #archiveteam-bs [17:37] *** ZexaronS- has joined #archiveteam-bs [17:40] *** ZexaronS has quit IRC (Read error: Operation timed out) [17:41] *** ZexaronS- has quit IRC (Client Quit) [18:00] *** JRWR has quit IRC (Quit: Page closed) [18:00] *** antomati_ has joined #archiveteam-bs [18:00] *** swebb sets mode: +o antomati_ [18:02] *** Silvan has quit IRC (Read error: Operation timed out) [18:02] *** antomatic has quit IRC (Ping timeout: 250 seconds) [18:02] *** antomati_ is now known as antomatic [18:04] *** SilSte has joined #archiveteam-bs [18:30] *** SHODAN_UI has quit IRC (Remote host closed the connection) [18:31] *** j08nY has quit IRC (Read error: Operation timed out) [18:40] *** JRWR has joined #archiveteam-bs [18:41] All the DATA, Give me maor! So as a just in case it happens [18:41] when I get to 75% full on my rsync ingress server, what do [18:42] add more HDDs [18:42] :p [18:42] its a OVH Box [18:42] JRWR: also, you probably should be in #DataHoarder on Freenode :P [18:43] I am already :3 [18:43] anyway, that wasn't totally serious advice [18:43] oh, you are? [18:43] well once I get my server back [18:43] not under this nick though? [18:43] ah [18:43] ya my 190TB Plex Archive is very nice [18:43] I used my main server as a Weechat instance [18:43] since I blanked it for this project I dont have that setup atm [18:44] since I have 1Gbps up going to waste, I though about syncing to FOS [18:45] but I would want to talk to SketchCow before I did, make sure everything was secure [18:45] I could do bulk uploads to it, then purge the old jobs that I have confrimed on FOS [18:45] but I've got another 6TB to go, so we have tons of time [18:46] if its even needed at all [19:04] 190TB on ovh? what do you pay? [19:07] im guessing plex cloud [19:07] synced with a google drive [19:08] thatd be a hefty chunk of change :D [19:08] I got one of the "Unlimted" google drive accounts [19:08] so thats where it is [19:08] UnEncrypted so it can be deduped [19:09] ah nice [19:11] The server I am using now for rsync ingress was my general server (had plex, website, other crap) [19:11] I wiped it and raid0 it for this, and man I'm impressed [19:13] my main is an EG-16 [19:13] they recently lowered the prices on them [19:13] but im locked in at my original price [19:13] :( [19:13] Thats what I'm using [19:13] im at 78$ [19:14] same [19:14] 79 [19:14] it's like 74 now [19:14] lol [19:14] I really want to find another one [19:14] that is cheaper with the same stats [19:16] dunno if youll be able to [19:16] unmetered BW is the bomb [19:18] everybody wants unmetered, but noone wants to pay for it :P [19:18] ^^ [19:19] i like cheap hetzner auction servers [19:19] JRWR: I'll handle the uploads to IA or FOS [19:19] 6tb for ~30€ but limited bw [19:20] "unlimited" bw on that one [19:20] nobody will beat OVH for unmetered [19:20] ^ [19:20] that's pretty much certain [19:20] anybody who does will almost certainly go bankrupt in under a year [19:20] that's why they have my service [19:20] hehe [19:20] (and they're usually just OVH resellers anyway) [19:20] unlimited as in over 20TB = your 1gbit goes 10mbit [19:21] (20TB outgoing) [19:21] so it's really not that bad for a hetzner server [19:21] like, probably the only reason OVH can get away with unmetered with a relatively good network is that they have a massive network of their own [19:21] none of the other big names in budget servers have that [19:21] tier 1 like, yeah [19:21] and none of the new players will have it either, at least not for a while [19:22] arkiver: still got the logins I gave you [19:22] joepie91: theyre expanding rapidly too [19:22] hence why it's pretty much certain that nobody will beat OVH without giving up on quality [19:22] for unmetered stuff :P [19:22] voidsta: yeah [19:22] been following the ceo on twitter [19:22] new dc here [19:22] new dc there [19:22] everyone gets a dc [19:22] lol [19:22] hehe, was about to make that joke [19:22] :D [19:23] JRWR: yes [19:29] Cool [19:38] also the scaleway arm64 instances are NICE [19:39] they are faster then the x86/arm ones [19:42] OVH is really the only name in dedicated servers without the fuss [19:42] Atleast state side [19:42] most of the other places are just meh [19:49] *** Igloo has quit IRC (Read error: Operation timed out) [19:50] agree [19:57] *** bmcginty has quit IRC (Ping timeout: 250 seconds) [20:00] *** Igloo has joined #archiveteam-bs [20:04] *** bmcginty has joined #archiveteam-bs [20:04] *** ZexaronS has joined #archiveteam-bs [20:21] *** pizzaiolo has joined #archiveteam-bs [20:30] Is this page broken for anyone else? https://archive.org/search.php?query=collection%3Aarchivebot&sort=-publicdate [20:30] The HTML just stops somewhere in the navigation links. [20:30] works for me [20:30] Hm, weird. [20:30] it has magic scrolling going on [20:30] when you get to the bottom [20:31] I know that, but I only get a partial navigation bar, no content at all. [20:31]
  • BLOG
  • [20:31]
  • ^ End of the HTML [20:33] Ya [20:33] Im getting full HTML [20:33] https://hastebin.com/ipupobevun.xml [20:34] Hm yeah, it works from one of my servers, but from this machine, I get it on both two different Firefox profiles and cURL. What the hell? [20:38] *** j08nY has joined #archiveteam-bs [20:40] *** kristian_ has joined #archiveteam-bs [20:42] Looks like the connection is killed after receiving about 32k of data. [20:44] JAA: where are you located, which network provider do you use [20:44] https://archive.org/search.php works, but any actual search has the same issue. [20:44] Sanqui: Switzerland, currently on a Swisscom connection (not my own) [20:45] ok. no problems from czech republic [20:46] It works from another machine within Switzerland. [20:46] (Not on the same network) [20:48] *** pizzaiolo has quit IRC (Quit: pizzaiolo) [20:55] Any of you going to defcon this year? [20:56] *** SHODAN_UI has joined #archiveteam-bs [21:00] huh, my yahoo answers thing has been running for 22 hours now [21:04] *** icedice has joined #archiveteam-bs [21:04] The IA page randomly started working again. ¯\_(ツ)_/¯ [21:07] its magic! [21:10] *** pizzaiolo has joined #archiveteam-bs [21:18] *** pizzaiolo has quit IRC (Read error: Connection reset by peer) [21:18] *** pizzaiolo has joined #archiveteam-bs [21:27] *** godane has joined #archiveteam-bs [21:37] *** ZexaronS has quit IRC (Leaving) [21:41] *** kristian_ has quit IRC (Quit: Leaving) [21:59] Is there any way to get from a Wayback Machine page to the corresponding item (or even better, WARC file)? [22:00] *** SHODAN_UI has quit IRC (Remote host closed the connection) [22:10] you could not download them anyways [22:10] *** Ravenloft has joined #archiveteam-bs [22:11] This particular one is from the ArchiveBot collection, so I most likely could. [22:21] JAA: try the viewer then https://archive.fart.website/archivebot/viewer/ [22:22] Sanqui: Yeah, I tried that, but didn't find any match. [22:25] I'm trying to look into the corruption issue that was discussed yesterday. Much of the discussion focused on wget-lua, but as DoomTay mentioned earlier in #archivebot, at least one ArchiveBot grab was also affected: https://web.archive.org/web/20160615222159/http://www.portalgraphics.net/pg/illust/?image_id=10575 [22:38] How certain are we that this "corruption" is really a client-side issue? It definitely looks like chunked transfer encoding gone wrong. But that could also be a missing "Transfer-Encoding: chunked" header from the portalgraphics server. Have we found any examples on other domains yet? [23:20] i can't upload to IA [23:20] 2.4%Warning: Transient problem: HTTP error Will retry in 5 seconds. 10 retries [23:20] i keep getting that [23:24] Here's a list mapping the URLs SketchCow posted yesterday to the corresponding IA item: https://hastebin.com/raw/iruvobodor . I also included some additional examples I found. [23:28] *** jtn2 has left [23:45] Hmm. I'm looking at archiveteam_portalgraphics_20160727144032 now. Their server does send the Transfer-Encoding: chunked header according to the WARC. And there is no double chunked encoding or something like that. [23:45] This makes me think that maybe the IA processes these incorrectly. [23:50] Here's the WARC records for https://web.archive.org/web/20160725184715/http://www.portalgraphics.net/pg/illust/?image_id=21107&lang=en : https://hastebin.com/raw/gokiwuzage [23:50] (Hastebin seems to screw up the Unicode characters there, disregard that.) [23:59] I suspect that it's due to the space character after "5c". This space doesn't conform to the specs ( https://tools.ietf.org/html/rfc7230#section-4.1 ), which define a chunk as `chunk-size [ chunk-ext ] CRLF chunk-data CRLF`. chunk-size is the size of the chunk in hexadecimal digits (upper or lower case), chunk-ext is an optional extension of the algorithm and is `*( ";" chunk-ext-name [ "=" chunk-ext-v [23:59] al ] ) [23:59] `, i.e. starts always with a semicolon.