[00:03] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [00:04] fantastic, this worked: -o %(id)s/%(title)s.%(ext)s [00:04] now to archive all youtubes [00:08] *** dashcloud has quit IRC (Read error: Operation timed out) [00:11] *** dashcloud has joined #archiveteam-bs [00:11] *** bwn__ is now known as bwn [00:13] *** BlueMaxim has joined #archiveteam-bs [00:19] JW_work: JesseW: i don't use it, but i had come across that warcreate extension, it looked like he was working on adding a 'record' type thing similar to what you were talking about [00:19] but it just did a snapshot of the current page when I had played with it [00:21] https://github.com/machawk1/warcreate [00:21] *** JesseW has joined #archiveteam-bs [00:25] *** Balrog_ has joined #archiveteam-bs [00:30] *** Balrog_ has quit IRC ( hung she dong) [00:30] be awesome if there was a firefox version [00:31] *** Start has joined #archiveteam-bs [00:41] Delimiter is still AWOL. would not recommend [00:41] and I thought OVH had bad customer service [00:43] *** BlueMaxim has quit IRC (Quit: Leaving) [00:44] bwn: https://github.com/machawk1/warcreate/issues/66 [00:44] thanks for pointing me at warcreate [01:03] *** Honno has quit IRC (Quit: Leaving) [01:06] * Yoshimura thanks VADemon. Could use that ;) Wish you as well. [01:11] joepie91: if you know the lowendtalk guy maybe you can vouch for me, new account ivank [01:12] *** wp494 has quit IRC (Read error: Connection reset by peer) [01:17] *** wp494 has joined #archiveteam-bs [01:33] *** tomwsmf-a has joined #archiveteam-bs [02:01] *** wp494 has quit IRC (Read error: Operation timed out) [02:01] *** wp494 has joined #archiveteam-bs [02:06] *** VADemon has quit IRC (Quit: left4dead) [02:26] *** atrocity has quit IRC (Ping timeout: 260 seconds) [02:28] *** atrocity has joined #archiveteam-bs [02:29] FUCK [02:29] power went out here, so lost my openwith shit [02:51] *** bwn has quit IRC (Read error: Operation timed out) [02:57] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [02:59] D: [03:12] The doom with url shorteners is terrible. Not sure what is worse the shorteners or the new ones with custom urls and long crap. [03:13] Also ads, and more. There are lot of uncrawled ones also. *looks at JesseW with smile* [03:14] * xmc smiles creepily [03:14] er, i'm not that creepy [03:14] Is there anywhere matadata about megawars on archive? Would like to systematically go through some files, indexes or pages. [03:15] megawarcs are just big warcs [03:15] what are you looking for? [03:15] I know. I meant I do not have to click on each page on AI. [03:15] *** tomwsmf-a has joined #archiveteam-bs [03:15] I meant IA... looking for HTML pages to extract data from. [03:16] I got both AT related, and two/three different projects related. So it would be handy. [03:16] First step would be metadata, second index, last sections of warcs by range requests to get only the HTML. [03:17] And only the more fresh, and depending on content. Some vast sites are kind of useless (except the very index or comments) [03:27] *** RichardG has quit IRC (Ping timeout: 260 seconds) [03:32] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [03:48] *** ErkDog has quit IRC (Read error: Operation timed out) [04:01] *** godane has quit IRC (Quit: Leaving.) [04:05] * Yoshimura found one problem when (ordinary) people get stuff for free... they then expect everything to be super f.... nice and free at least, if not give them gifts. [04:06] *** ErkDog has joined #archiveteam-bs [04:21] *** Crocatowa has joined #archiveteam-bs [04:28] *** ErkDog has quit IRC (Read error: Operation timed out) [04:33] *** ErkDog has joined #archiveteam-bs [04:40] dem ordinary people [04:41] *** bwn has joined #archiveteam-bs [04:50] *** BlueMaxim has joined #archiveteam-bs [04:55] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [05:02] *** Sk1d has joined #archiveteam-bs [06:22] *** hawc145 has joined #archiveteam-bs [06:27] *** HCross has quit IRC (Read error: Operation timed out) [06:36] *** schbirid has joined #archiveteam-bs [06:49] *** JesseW has quit IRC (Quit: Leaving.) [06:49] *** JesseW has joined #archiveteam-bs [07:02] *** JesseW has quit IRC (Ping timeout: 370 seconds) [07:06] damn [07:07] new EU privacy laws give European privacy watchdog the authority to impose fines of up to 20 million euro or 4% of the *global* revenue for significant violations, and 10 million / 2% for more 'formal' violations [07:25] *** metalcamp has joined #archiveteam-bs [07:33] *** Medowar has joined #archiveteam-bs [07:34] *** VADemon has joined #archiveteam-bs [07:59] *** mismatch_ has quit IRC (Remote host closed the connection) [08:01] *** mismatch_ has joined #archiveteam-bs [08:29] *** godane has joined #archiveteam-bs [08:36] *** hawc145 is now known as HCross [09:24] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [09:56] *** bwn has quit IRC (Read error: Operation timed out) [10:05] *** bwn has joined #archiveteam-bs [10:55] *** metalcamp has joined #archiveteam-bs [10:56] Would this truck blend in anywhere? https://twitter.com/textfiles/status/722094405931397121/photo/1 [11:36] *** RichardG has joined #archiveteam-bs [11:38] *** Medowar has quit IRC (Quit: Connection closed for inactivity) [11:50] who would steal from a library... [11:52] maybe they just borrowed it [11:56] *** Lord_Nigh has quit IRC (Read error: Operation timed out) [12:00] *** Lord_Nigh has joined #archiveteam-bs [12:04] *** Lord_Nigh has quit IRC (Read error: Operation timed out) [12:18] *** Medowar has joined #archiveteam-bs [12:24] *** BlueMaxim has quit IRC (Quit: Leaving) [12:41] *** Lord_Nigh has joined #archiveteam-bs [12:45] *** vitzli has joined #archiveteam-bs [13:30] *** tomwsmf-a has joined #archiveteam-bs [13:31] *** RichardG has quit IRC (Ping timeout: 272 seconds) [13:34] That truck [13:42] *** hook54321 has joined #archiveteam-bs [13:47] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [14:11] *** RichardG has joined #archiveteam-bs [14:16] *** Start has quit IRC (Quit: Disconnected.) [15:03] *** RichardG has quit IRC (Read error: Operation timed out) [15:03] *** RichardG has joined #archiveteam-bs [15:09] *** tomwsmf-a has joined #archiveteam-bs [15:20] *** RichardG has quit IRC (Ping timeout: 250 seconds) [15:21] *** RichardG has joined #archiveteam-bs [15:29] *** JesseW has joined #archiveteam-bs [15:30] *** godane has quit IRC (Read error: Operation timed out) [15:31] *** Start has joined #archiveteam-bs [15:44] *** RichardG has quit IRC (Ping timeout: 244 seconds) [15:50] *** RichardG has joined #archiveteam-bs [15:50] *** hook54321 has quit IRC (Quit: Connection closed for inactivity) [15:51] Sorry again, but newsbuddy could do with more power and grabbers please [15:55] *** Start has quit IRC (Ping timeout: 260 seconds) [15:56] HCross2: Define power and grabber? [15:57] Grabber = pipe, power = cpu? [15:57] Yeah, bandwidth and CPU really [16:00] *** Start has joined #archiveteam-bs [16:01] I could serve BW, or CPU, not both at same time currently. [16:01] Well, both, but not mutually, aka diff location. [16:02] If you do not need dedicated box, I can give you container, I use 1/100 - 5/100 of available bw atm. [16:05] *** Start has quit IRC (Remote host closed the connection) [16:05] *** JesseW has quit IRC (Ping timeout: 370 seconds) [16:05] >Container [16:05] We really should get a docker image. If I have time, I can create one. [16:05] but right now, time is an issue. [16:07] Medowar: Yeah, I can provide HCross2 a Docker with ubuntu stuff (phusion/baseimage). [16:08] yeah i have done the same. [16:08] Btw, anyone knows about high bandwidth pipeline? [16:08] But with a debian base please [16:08] Maybe not neded at all, the hltv is going off in few days. [16:08] And wayback lacks tons of it [16:08] you mean high bandwith servers? [16:09] Nah archivebot. noone seems to care + the site seems to be loaded so warrior would make no sense [16:09] Someone might try to reach them or something, but saying they will shutdown in few days sucks. [16:10] After people realized that I guess more people crawl them personally or something. [16:10] http://www.hltv.org/?pageid=86&galleryid=7880 [16:10] Example page. Load speed, and I think this one is not in wayback either. [16:11] hltv is going offline? [16:11] I announced that twice at least on main channel [16:11] Noone cared or noticed. Yes on 23rd april. [16:12] wow. Is there an official announcement anywhere? [16:12] Yes on twitter it was I think [16:12] https://twitter.com/hltvorg_/status/722083587357544448 [16:13] But it may or not be hoax, I do not know. [16:13] fake. Wrong twitter account. [16:13] The twitter handle sounds sketchy. But someone already on wiki said its valuable. So if it is confirmed hoax, we should still crawl it after the 23rd. [16:14] it has literarly nothing on it other than the announcement. [16:14] https://twitter.com/HLTVORG [16:14] this is the original account [16:14] also, the creator announced 9 months ago, that he is going fulltime hltv, so I dont think, that it is shutting down [16:14] http://www.hltv.org/?pageid=135&userid=1&blogid=10102 [16:16] and it is the most important CSGO news site. Has dedicated staff to do interviews on events and stuff [16:17] afk 30 min, driving home [16:22] Alright, then I guess best strategy would be to wait and fetch the site once a year. [16:22] bot would have space problems maybe, due to galeries, I do not know. [16:23] *** SimpBrain has joined #archiveteam-bs [16:47] *** bwn_ has joined #archiveteam-bs [16:59] *** bwn has quit IRC (Read error: Operation timed out) [17:05] newsgrabber on warrior? if so, i can give you like 40/40 [17:06] NOPE [17:06] It isnty [17:06] isnt [17:07] :/ [17:08] yuku it is, lol [17:11] *** vitzli has quit IRC (Quit: Leaving) [17:14] HCross: had a chat with "Michael" who tells me "service will be live by Friday" because that's when they set up their drives in a batch [17:14] and still no response to tickets or emails [17:14] pretty sure I'm going to be out $130 on my drive [17:15] Ouch :/ [17:18] Are the drives over at their DC now? [17:18] I have no idea. they received the drive last Wednesday [17:19] maybe it's already been sold for their hookers-and-blow fund [17:19] Then surely it would be in last Friday's batch if they do it weekly [17:19] nah, more like their "Downtime poptart fund" [17:30] *** jspiros has quit IRC (Read error: Operation timed out) [17:34] *** jspiros has joined #archiveteam-bs [17:45] *** Start has joined #archiveteam-bs [17:46] *** Honno has joined #archiveteam-bs [17:53] *** JW_work1 has joined #archiveteam-bs [17:59] *** JW_work has quit IRC (Ping timeout: 370 seconds) [18:08] I opened a ticket to get them to cancel and return my drive [18:08] fuckers will probably try to bill me $25 for packing the drive [18:10] pay it, then speak to your CC company [18:10] yeah [18:10] but wait until the drive is in your hand before [18:13] I'm out $39 just for shipping back and forth [18:13] last month I was out $28 for shipping smoke-filled PS3s back and forth [18:13] it's good to be a shipping co [18:14] yeah, that does sound a tad expensive though. I sent a 2.5inch disk from London to LA for £14 the other month [18:15] *** Start has quit IRC (Quit: Disconnected.) [18:16] Took less than 48 hours to reach LA, but then another 2 weeks to get through customs [18:16] heh [18:17] Yep [18:18] *** Start has joined #archiveteam-bs [18:19] something to do with sending HDDs from the EU being risky or something [18:53] Yeah, if it goes air or ship ... air means radiation from cosmos. [18:54] Ship might be ok but slow, but temperatures. [18:54] Transport over Wire with special purpose application and protocol (scientists have that and they are free or oss) over UDP work sbest. [19:00] no, probably more like the contents of the drive [19:00] pretty sure they're fine with the whole air travel bit in general [19:02] it was empty too [19:03] Kazzy: Cosmic radiation = damaging the bits on the magnetic surface? [19:04] Yoshimura: wrap it in tin foil, that blocks all the rads [19:05] Nope. [19:05] lead foil [19:05] i don't lose my bits when i go on a plane, why does a metal thing [19:06] Density. [19:06] * bwn_ makes a foil hat [19:06] Also cosmic radiation is fast as hell shielding does not work much. [19:06] wait that's rude [19:09] Eelectrical field around the drive :D [19:15] Yoshimura: planes are fast as hell [19:18] ^ nearly 11 hours from London to LA [19:23] *** bwn_ has quit IRC (Read error: Operation timed out) [19:27] *** schbirid has quit IRC (Quit: Leaving) [19:32] Yoshimura: I don't see what the speed of the particles has to do with shielding [19:32] things can and are shielded from cosmic rays, otherwise the satellites orbiting Earth would have issues [19:33] Frogging: It goes through, only mountains help. Yeah, can but costly. [19:33] pretty sure they don't have mountains in orbit [19:33] Nope, but they got storage media resistent made for that [19:33] they have shielding [19:34] the microchips aren't special, they're just shielded [19:36] Shield your disk and send it instead of upload then [19:37] *** godane has joined #archiveteam-bs [19:44] well, yes. that's what was being suggested. Your objection was that "the radiation is too fast so shielding doesn't work", remember? [19:45] *** Start has quit IRC (Quit: Disconnected.) [19:46] *** bwn_ has joined #archiveteam-bs [19:47] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [20:13] *** Start has joined #archiveteam-bs [20:24] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [20:30] *** powerKite has joined #archiveteam-bs [20:33] * zino is trying to remember what free forum hosting service his lost forum was on. [20:35] invisionfree? [20:36] I think the worst thing about archiving an ARG [20:36] is that you end up having to ***DO THE PUZZLES AGAIN*** to find out what you need to archive [20:37] Heh. [20:38] xmc: I think the domain for the forum contained "easyforum.com or something. [20:40] forumotion? [20:40] Hmm. Nope. [20:41] anyway, is there a Megaswf archive I just don't know about or somthing? [20:41] or am I just fucked in regards to getting those SWFs [20:46] judging by the lake of responses, it's probably the latter [20:47] Quite possibly [20:48] *** Medowar has quit IRC (Quit: Connection closed for inactivity) [20:49] :P [21:00] *** powerKite has quit IRC (Quit: Page closed) [21:10] *** Start has quit IRC (Quit: Disconnected.) [22:17] https://twitter.com/textfiles/status/722530539006214146 [22:19] *** ErkDog has quit IRC (Read error: Operation timed out) [22:20] JW_work1: Great! I choose to belive that it was my retweet that did the differance... [22:21] I'm just curious what damage, if any, there will be to it. [22:21] *** BlueMaxim has joined #archiveteam-bs [22:21] Hopefully if there's damage to the paint job, they can get the artist to fix it [22:23] Is there anywhere a picture of the van? [22:24] https://twitter.com/textfiles/status/722094405931397121 [22:24] Thanks ;) [22:26] *** ErkDog has joined #archiveteam-bs [22:32] Ok, pipeline, would like to run one. [22:34] vbox + https://archive.org/download/archiveteam-warrior/archiveteam-warrior-v2-20121008.ova [22:34] Who could help or provide more info, would be glad. I did not care till now, when apparently pipes are loaded, stalling etc. If they all work there would be enough BW. [22:34] Kazzy: Archivebot :P Alerady running all projects on warrior simultaneously at concurrency 6 (which IRL is lower thanks to lack of work) [22:35] archivebot is a ton more involved [22:35] basically don't bother even trying unless you can provide 50/50 (ideally 100/100 line) for 2-3 months minimum at 100% uptime, guaranteed with no filtering [22:36] if you pass all that, proceed to https://github.com/ArchiveTeam/ArchiveBot/blob/master/INSTALL.pipeline [22:37] Kazzy: 100/100 [22:37] Atm, at least once. [22:38] Not even SLAs have 100%, but 99.9 [22:39] And filtering is needed, and used almost everywhere, people just pretend to think its not (IPS, IDS) [22:39] SketchCow: we are up to 2008-07-05 with funny or die archive videos [22:40] But the providers do it, to lower DDoS, while retaining the real bandwidth, plus residual DoS. [22:49] If you want your pipeline to only handle !ao/!archiveonly jobs, run it with the AO_ONLY environment variable set. [22:50] Sounds like a job for me, starting small. Sounds great. [22:56] *** Honno has quit IRC (Quit: Leaving) [23:23] *** Rickster has quit IRC (Ping timeout: 260 seconds) [23:34] *** Rickster has joined #archiveteam-bs [23:38] *** VADemon has quit IRC (Quit: left4dead) [23:50] *** JesseW has joined #archiveteam-bs