[00:25] godane I've been working on archving that for years :) [00:25] had to restart when they changed format a year or so ago [00:25] but been steadily going since [00:27] *** Mateon1 has quit IRC (Read error: Operation timed out) [00:28] *** Mateon1 has joined #archiveteam-bs [00:30] looks like I've got though '90 done so far [00:32] i found maybe way to grab 60 minutes [01:11] *** BlueMaxim has quit IRC (Leaving) [01:13] *** BlueMaxim has joined #archiveteam-bs [02:17] *** antomatic has quit IRC (Read error: Connection reset by peer) [02:18] *** antomatic has joined #archiveteam-bs [02:18] *** swebb sets mode: +o antomatic [02:18] *** decay_ has quit IRC (Read error: Operation timed out) [02:19] *** decay_ has joined #archiveteam-bs [02:46] *** username1 has joined #archiveteam-bs [02:48] *** zhongfu has quit IRC (Ping timeout: 260 seconds) [02:50] *** schbirid2 has quit IRC (Read error: Operation timed out) [03:09] *** zhongfu has joined #archiveteam-bs [03:53] *** underscor has quit IRC (Remote host closed the connection) [04:10] *** octothorp has quit IRC (Remote host closed the connection) [04:11] *** octothorp has joined #archiveteam-bs [04:40] *** nyaomi has quit IRC (Quit: meow) [04:51] *** qw3rty111 has joined #archiveteam-bs [04:57] *** qw3rty119 has quit IRC (Read error: Operation timed out) [05:03] *** nyaomi has joined #archiveteam-bs [05:33] *** nyaomi has quit IRC (Ping timeout: 245 seconds) [05:56] *** Fletcher has joined #archiveteam-bs [06:00] *** nyaomi has joined #archiveteam-bs [06:00] *** ranav has joined #archiveteam-bs [06:07] *** ranavalon has quit IRC (Read error: Operation timed out) [06:19] *** Pixi has quit IRC (Quit: Pixi) [06:28] *** Pixi has joined #archiveteam-bs [06:40] *** godane has quit IRC (Quit: Leaving.) [07:03] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [07:04] *** BlueMaxim has joined #archiveteam-bs [07:14] *** BlueMaxim has quit IRC (Ping timeout: 600 seconds) [07:16] *** BlueMaxim has joined #archiveteam-bs [07:24] *** godane has joined #archiveteam-bs [07:35] *** BlueMaxim has quit IRC (Ping timeout: 252 seconds) [07:47] *** BlueMaxim has joined #archiveteam-bs [07:56] *** BlueMaxim has quit IRC (Ping timeout: 252 seconds) [08:18] *** schbirid2 has joined #archiveteam-bs [08:23] *** username1 has quit IRC (Read error: Operation timed out) [08:29] *** BlueMaxim has joined #archiveteam-bs [09:06] *** pizzaiolo has joined #archiveteam-bs [10:03] *** BnAboyZ has quit IRC (Quit: Ping timeout (120 seconds)) [10:08] *** BnAboyZ has joined #archiveteam-bs [10:17] *** SN4T14 has quit IRC (Ping timeout: 260 seconds) [10:22] *** SN4T14 has joined #archiveteam-bs [10:37] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [10:56] *** Fletcher_ has quit IRC (Read error: Operation timed out) [10:56] *** Fletcher_ has joined #archiveteam-bs [11:31] *** pizzaiolo has quit IRC (Remote host closed the connection) [11:33] *** pizzaiolo has joined #archiveteam-bs [11:36] *** pizzaiolo has quit IRC (Remote host closed the connection) [11:37] *** pizzaiolo has joined #archiveteam-bs [11:46] *** pizzaiolo has quit IRC (Remote host closed the connection) [11:47] *** pizzaiolo has joined #archiveteam-bs [11:47] *** pizzaiolo has quit IRC (Remote host closed the connection) [11:48] *** pizzaiolo has joined #archiveteam-bs [11:54] *** pizzaiolo has quit IRC (Remote host closed the connection) [11:55] *** pizzaiolo has joined #archiveteam-bs [11:55] *** pizzaiolo has quit IRC (Remote host closed the connection) [11:56] *** pizzaiolo has joined #archiveteam-bs [12:11] *** pizzaiolo has quit IRC (Remote host closed the connection) [12:13] *** pizzaiolo has joined #archiveteam-bs [12:14] *** pizzaiolo has quit IRC (Remote host closed the connection) [12:16] *** pizzaiolo has joined #archiveteam-bs [12:17] *** pizzaiolo has quit IRC (Remote host closed the connection) [12:19] *** pizzaiolo has joined #archiveteam-bs [12:19] *** pizzaiolo has quit IRC (Read error: Connection reset by peer) [12:21] *** pizzaiolo has joined #archiveteam-bs [12:22] *** pizzaiolo has quit IRC (Remote host closed the connection) [12:22] *** pizzaiolo has joined #archiveteam-bs [12:22] *** pizzaiolo has quit IRC (Read error: Connection reset by peer) [12:23] *** pizzaiolo has joined #archiveteam-bs [12:23] *** pizzaiolo has quit IRC (Read error: Connection reset by peer) [12:24] *** pizzaiolo has joined #archiveteam-bs [12:24] *** pizzaiolo has quit IRC (Remote host closed the connection) [12:28] *** pizzaiolo has joined #archiveteam-bs [12:28] *** pizzaiolo has quit IRC (Remote host closed the connection) [12:30] *** pizzaiolo has joined #archiveteam-bs [12:30] *** JAA sets mode: +b *!*pizzaiolo@186.205.2.* [12:30] *** pizzaiolo was kicked by JAA (Fix your connection please.) [12:47] *** RichardG_ has joined #archiveteam-bs [12:54] *** RichardG has quit IRC (Read error: Operation timed out) [12:56] *** RichardG_ has quit IRC (Read error: Connection reset by peer) [12:56] *** RichardG has joined #archiveteam-bs [13:30] *** JAA sets mode: -b *!*pizzaiolo@186.205.2.* [14:07] *** zyphlar has quit IRC (Max SendQ exceeded) [14:07] *** zyphlar has joined #archiveteam-bs [16:01] *** odemg has quit IRC (Quit: Leaving) [16:19] *** ubahn has joined #archiveteam-bs [16:42] *** klondike has joined #archiveteam-bs [16:44] Hi, I want to report a popular spanish devianart like site that is dying on the 31st of January as per http://subcultura.es/blogs/Neverwolf/anuncio-importante-32407/ [16:45] I have started archiving the site myself with httrack and managed to bypass age restrictions (and the we use cookies announcement), but there is no way I can get the whole thing done in time. [16:46] Another hacker told me about you, so I'd like to know if and how can I get help from you archiving the site and it's subdomains as you are a lot more skilled than I. [16:47] *** odemg has joined #archiveteam-bs [16:52] Ok, We can look into it [16:52] What are you using so far to do this? What limits have you come up against? [16:52] *** klondike2 has joined #archiveteam-bs [16:52] I see you mention cookies, have you generated a byunch of user accounts etc? [16:52] 16:52 < Igloo> Ok, We can look into it [16:52] 16:52 < Igloo> What are you using so far to do this? What limits have you come up against? [16:53] 16:52 -!- klondike2 [~klondike@c80-216-57-193.bredband.comhem.se] has joined #archiveteam-bs [16:53] 16:52 < Igloo> I see you mention cookies, have you generated a byunch of user accounts etc? [16:53] Also, do you have any idea how large it is in total (how many subdomains, etc.)? [16:54] Igloo: so far I have been using httrack, the list of domains I got from the ranking site but it should be reachable from the main domain [16:54] Basically the site spans subcultura.es and *.subcultura.es [16:55] All of them point to the same server though. [16:55] I have used httrack, the main issue is that database accesses are slow (as said by and admin) so most dynamic pages (like forum posts or webcomic entries) are slow to generate [16:56] *** klondike has quit IRC (Quit: Page closed) [16:56] *** klondike2 is now known as klondike [16:56] The total of subdomains is around 8100 [16:57] subcultura.es is the largest one as it contains a lot of things including author and user profiles amongst other things. [16:58] I also have removed it from the list except for any content used from the subdomains as there is no way I can back up the whole thing. [16:58] Regarding the limits so far [16:58] 1. Some of the content from the subdomains is hosted on the main subcultura.es I'm afraid I may be missing something. [16:59] 2. Some domains do start with hyphen so I had to patch my server glibc to be able to resolve those domains correctly [16:59] 3. EU law requires a stupid banner saying they use cookies, I have figured out how to get rid of it. [17:01] 4. Some sites are behind a confirm you are over 18yo post-wall I have addressed that with cookies too. [17:01] 5. I don't think more than 4 sockets sending requests from my IP will be appreciated, this is slow so I had to exclude the main website (subcultura.es) except for content I got by reversing the modules of the subdomains. [17:02] Re: 2. Wow, who the hell thought that's a good idea? I don't think such domain names are actually allowed by the RFCs even. Do you have an example? [17:02] 6. Some sites have a hidden image if you as a registered user carry out an action that can only be done once every 24 hours in the whole site so I have also excluded that. [17:03] JAA: http://--.subcultura.es/ it should work on windows and MAC, not Linux though. [17:03] Re: 4. Do those blocks ever appear on subcultura.es itself or only on subdomains? (I've clicked around a bit and didn't see one.) [17:04] JAA: may appear on subcultura.es too, but I haven't found any. You can find them on http://666.subcultura.es for example. [17:04] The related cookie for them is called Maria (the devs did have a good sense of humor) [17:04] Checked RFC 1035, and that's definitely not legal: "