[00:00] *** xf2e has joined #archiveteam-bs [00:08] *** oldcad has quit IRC (Quit: Leaving.) [00:17] *** Aranje has quit IRC (Read error: Connection reset by peer) [00:17] *** Aranje has joined #archiveteam-bs [00:27] 2008 dailymail.co.uk sitemap urls are fully uploaded now [00:28] *** mistym has quit IRC (Remote host closed the connection) [00:35] *** Aranje has quit IRC (Ping timeout: 240 seconds) [00:41] *** BlueMaxim has joined #archiveteam-bs [00:42] *** mistym has joined #archiveteam-bs [00:44] *** kyan has quit IRC (Quit: This computer has gone to sleep) [00:47] *** Aranje has joined #archiveteam-bs [01:35] trying to archive a site that uses ajax for pagination. is there any way to make archivebot click on things in phantomjs mode? [01:36] *** kyan has joined #archiveteam-bs [01:36] i'm grabbing MontherboardTV youtube channel [01:37] its not that big it looks like [01:42] I ta [01:42] have to stop using google code. Eventually. I'm still using it all the time [01:58] *** mistym has quit IRC (Remote host closed the connection) [02:10] *** mistym has joined #archiveteam-bs [02:11] *** bzc6p_ has joined #archiveteam-bs [02:11] *** swebb sets mode: +o bzc6p_ [02:14] *** bzc6p has quit IRC (Read error: Operation timed out) [02:35] *** Aranje has quit IRC (Read error: Connection reset by peer) [02:36] *** Aranje has joined #archiveteam-bs [02:38] So I'm interested in (and have been) backing up Quora [02:49] *** primus104 has quit IRC (Leaving.) [02:58] *** mistym has quit IRC (Remote host closed the connection) [03:08] *** Asparagir has joined #archiveteam-bs [03:09] *** robink has joined #archiveteam-bs [03:23] *** mistym has joined #archiveteam-bs [03:27] *** bzc6p_ is now known as bzc6p [03:27] szalwia: AFAIK it doesn't. In such cases, I try to find out the number of pages from the source code and generate the URLs for the pages. [03:29] *** Asparagir has quit IRC (Asparagir) [03:30] bzc6p: no way to do this in my case, the ajax call returns javascript that then gets eval'd ;x [03:31] i wrote my own scraper that works, but it only gets me image urls https://gist.github.com/szalwia/d8658efd2f66a7584050 [03:31] If you analyze the JS code, you might find out what URLs are GETted with ajax [03:33] *** Aranje has quit IRC (Read error: Connection reset by peer) [03:34] *** Aranje has joined #archiveteam-bs [03:34] bzc6p: already did and as i said, they return additional javascript that gets eval()'d [03:35] *** Aranje has quit IRC (Read error: Connection reset by peer) [03:35] *** Aranje has joined #archiveteam-bs [03:36] szalwia: what is the site, if I may ask? (Not because I don't believe you, just to see that bastard myself.) [03:37] bzc6p: ask.fm [03:38] *** Aranje has quit IRC (Read error: Connection reset by peer) [03:39] *** godane has quit IRC (Ping timeout: 306 seconds) [03:40] bzc6p: that's how the "View more" button on user profiles works [03:42] I must be logged in to use that button? [03:44] bzc6p: no [03:44] http://ask.fm/ameeraaxxxxxxx [03:44] it's at the bottom of the page [03:44] I see. It didn't worked for me, now I realized it needed cookies to work. [03:45] A think I'll never understand. [03:45] *thing [03:46] It makes a POST request. Argh. [03:46] *** Aranje has joined #archiveteam-bs [03:48] *** godane has joined #archiveteam-bs [03:49] modem keeps going out [03:58] https://en.wikipedia.org/wiki/Resiniferatoxin hooooly shit [04:05] countdown to youtube challenge [04:05] hah [04:07] *** mistym has quit IRC (Remote host closed the connection) [04:08] szalwia: it's indeed awful. I still believe that someone with advanced knowledge in JS can find out what POST request is actually triggered. And it's probably also WARCable, at least I managed to capture some POSTy shit with webrecorder.io, so I guess it's possible. – If you don't mind, I don't go deeper into that awful JS, otherwise I'll go insane. [04:09] szalwia: Also, ask.fm is a big site, thorough archiving would be a Warrior project one day when it's endangered. For archiving single profiles, you should try webrecorder.io, I did manage to archive e.g. Facebook pages with that [04:10] so it may work with this one too. [04:13] https://webrecorder.io/ looks down from here [04:22] *** Asparagir has joined #archiveteam-bs [04:30] szalwia: here too [04:32] *** bzc6p sets mode: +o chfoo [04:33] works here [04:33] *** bzc6p sets mode: +oooo godane garyrh Infreq Kazzy [04:33] *** bzc6p sets mode: +oooo Kenshin midas Start wp494 [04:33] aaaaaaaaa: the front page does, but the archiving page gives 504 [04:33] ah ok, sorry for the confusion then. [04:34] aaaaaaaaa: no, me sorry, it works for other sites, [04:34] it just doesn't like ask.fm apparently [04:37] *** aaaaaaaaa has quit IRC (Leaving) [04:37] Now the front page also 504s. [04:39] *** mistym has joined #archiveteam-bs [04:57] *** bzc6p has left [05:09] *** Ravenloft has quit IRC (Ping timeout: 240 seconds) [05:39] *** Start_ has joined #archiveteam-bs [05:39] *** Start has quit IRC (Read error: Connection reset by peer) [05:55] *** bsmith093 has quit IRC (Read error: Operation timed out) [06:09] *** Aranje has quit IRC (Read error: Connection reset by peer) [06:11] *** Start has joined #archiveteam-bs [06:12] *** Start_ has quit IRC (Read error: Connection reset by peer) [06:15] *** Aranje has joined #archiveteam-bs [06:16] *** Aranje has quit IRC (Read error: Connection reset by peer) [06:24] *** mistym has quit IRC (Remote host closed the connection) [07:10] *** Asparagir has quit IRC (Asparagir) [07:25] *** mistym has joined #archiveteam-bs [07:28] *** yipdw has quit IRC (Quit: No Ping reply in 180 seconds.) [07:30] *** yipdw has joined #archiveteam-bs [07:34] *** mistym has quit IRC (Read error: Operation timed out) [07:34] *** primus104 has joined #archiveteam-bs [07:40] *** schbirid has joined #archiveteam-bs [07:58] *** schbirid is now known as schbiridw [07:58] *** schbiridw is now known as schbiwork [08:06] *** toad2 has joined #archiveteam-bs [08:08] *** toad1 has quit IRC (Read error: Operation timed out) [08:29] *** primus104 has quit IRC (Leaving.) [09:41] *** dx has quit IRC (Read error: Operation timed out) [09:42] *** primus104 has joined #archiveteam-bs [09:54] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [10:09] *** dx has joined #archiveteam-bs [10:12] *** godane has quit IRC (Ping timeout: 265 seconds) [10:25] *** godane has joined #archiveteam-bs [10:57] *** godane has quit IRC (Quit: Leaving.) [11:06] *** godane has joined #archiveteam-bs [11:38] *** xf2e has quit IRC (Ping timeout: 483 seconds) [11:57] *** ohhdemgir has quit IRC (Read error: Connection reset by peer) [12:00] *** balrog has quit IRC (Read error: Operation timed out) [12:01] *** ripvanwin has quit IRC (Read error: Operation timed out) [12:02] *** ohhdemgir has joined #archiveteam-bs [12:05] *** dashcloud has quit IRC (Read error: Operation timed out) [12:08] *** balrog has joined #archiveteam-bs [12:08] *** swebb sets mode: +o balrog [12:11] *** dashcloud has joined #archiveteam-bs [12:46] *** Stilett0 has joined #archiveteam-bs [12:50] *** Stiletto has quit IRC (Ping timeout: 370 seconds) [13:11] *** primus104 has quit IRC (Leaving.) [13:26] SketchCow: a brief interview with the cuba SNET guy: http://media2.wptv.com/video/video_studio/2015/01/26/Jovenes_cubanos_construyeron_en_secreto__250724.mp4 [13:27] we have video and words directly from him now [13:27] there are no subs sadly [13:43] *** Panasonic has joined #archiveteam-bs [13:43] *** Panasonic is now known as Ravenloft [13:44] https://retrogamingnr.wordpress.com/2015/07/06/17/ [13:44] this is a nice write up [13:47] except when it isnt [13:51] the AVS by bunnyboy will be top class, he dismissed it, in a not elegant way, showing a pic of a proto encarnation of the HDMI solution that involved a top loader and was cancelled, it happened before the change to a brand new FPGA system, that is fully developed and in production [14:30] *** mistym has joined #archiveteam-bs [14:40] *** mistym has quit IRC (Remote host closed the connection) [15:07] *** mistym has joined #archiveteam-bs [16:08] *** primus104 has joined #archiveteam-bs [16:43] *** goekesmi_ has quit IRC (Remote host closed the connection) [16:45] *** Ravenloft has quit IRC (Ping timeout: 606 seconds) [16:50] *** mistym has quit IRC (Remote host closed the connection) [16:55] *** goekesmi has joined #archiveteam-bs [17:05] *** mistym has joined #archiveteam-bs [17:09] *** primus104 has quit IRC (Leaving.) [17:39] *** dashcloud has quit IRC (Read error: Operation timed out) [17:44] *** aaaaaaaaa has joined #archiveteam-bs [17:44] *** swebb sets mode: +o aaaaaaaaa [17:45] *** dashcloud has joined #archiveteam-bs [17:53] nice, booting into rescue mode at oneprovider.com and port 22 is filtered [17:56] oh wait, it just took a "while" [18:05] *** ripvanwin has joined #archiveteam-bs [18:29] *** dashcloud has quit IRC (Read error: Operation timed out) [18:35] *** dashcloud has joined #archiveteam-bs [18:43] *** dashcloud has quit IRC (Read error: Operation timed out) [18:59] *** dashcloud has joined #archiveteam-bs [19:02] *** primus104 has joined #archiveteam-bs [19:18] *** aaaaaaaaa has quit IRC (Leaving) [19:33] *** aaaaaaaaa has joined #archiveteam-bs [19:33] *** swebb sets mode: +o aaaaaaaaa [19:42] for future reference, to mirror an entire github user (repos only): curl https://api.github.com/users/$USERNAME/repos | jq -r .[].clone_url | xargs -L 1 git clone --mirror [19:42] nice [19:48] xmc: jq is basically magic. :P [19:48] yuuup [19:48] but the docs are shit, so: [19:48] -r means raw output, ie. no quote marks around the strings [19:48] i used it the other day in a makefile to avoid writing a proper client for an api [19:48] .[] means "for every item in the array" [19:49] .propname gets the propname [19:49] xmc: hehe [19:49] honestly syntax is pretty simple, it just has really poor docs [19:49] lol [19:49] jq | curl | jq --exit-status '.success == true' && success-stuff [19:50] oh, heh, clever [19:50] yeah i was pretty pleased [19:50] didn't know about exit-status [19:50] but I can guess what it does [19:50] "exit 0 if the expression is true" [19:50] yep [19:50] possibly also something like "or if it's a number, use that as exit code" [19:50] i could have done just '.success' but i don't always trust truthiness [19:50] (wild guess) [19:50] yeah [19:50] good call :P [19:51] too many people do.. it's like one of the top 3 things I end up correcting [19:51] when reviewing people's code [19:51] "you should not do that. use == null instead" [19:51] (it's related to truthiness) [19:53] http://www.cs.utah.edu/~gk/atwork/ [19:54] *** aaaaaaaaa has quit IRC (Leaving) [19:54] curl https://api.github.com/users/"${USERNAME}"/repos | grep -Eo "https://github.com/"${USERNAME}"/.*\.git" | xargs -L 1 git clone --mirror [19:54] :P [19:54] schbiwork is it in their Paris location? [19:55] HCross: yeah, currently we reached "Can we run some test on the server? This will cause a short downtime." [19:55] Its basically Online.net rebranded [19:55] schbiwork: good bit less reliable [19:55] http://www.online.net/en/dedicated-server - go straight to the source [19:56] HCross: i know i know but their offer was good [19:56] schbiwork: you can also sed -e 's/"/\n/g' | grep '\.git$' [19:56] :) [20:27] *** mistym has quit IRC (Remote host closed the connection) [20:35] *** dashcloud has quit IRC (Read error: Operation timed out) [20:38] *** dashcloud has joined #archiveteam-bs [20:40] *** mistym has joined #archiveteam-bs [21:00] *** mistym has quit IRC (Remote host closed the connection) [21:16] *** mistym has joined #archiveteam-bs [21:54] *** RichardG has quit IRC (Remote host closed the connection) [21:58] *** RichardG has joined #archiveteam-bs [22:04] *** kyan has quit IRC (Quit: This computer has gone to sleep) [22:28] *** aaaaaaaaa has joined #archiveteam-bs [22:28] *** swebb sets mode: +o aaaaaaaaa [22:32] so I just discovered my second least favorite call to get from my family. [22:34] "my computer is acting funny." "what were you doing before that?" "I went to a website and the screen was all red, so my neighbor said to try it with internet explorer." [22:35] ;_; [22:35] im hoping it was at least 10 or 11 [22:35] aaaaaaaaa: you're going to want to go over there right now if you can [22:35] and ensure it's not a cryptolocker [22:36] that is where I was [22:36] ah ok [22:36] also teach them to not ignore 'there's danger ahead' screens... >.> [22:36] (because $5 that that was the "all red screen" they were complaining about) [22:36] opendns + adblock everything + ghostery/disconnect to block oauth prompts [22:37] RedType: fuck ghostery [22:37] privacy badger [22:37] yeah, the default web browser is chrome, so that is what I am thinking too. [22:37] (ghostery is operated by a marketing company who collect and resell data on tracker blocking) [22:37] (seriously) [22:37] "this site redirected me to google? it's safe to log in to google!" [22:37] seriously? [22:37] yes, seriously [22:37] heh, guess so [22:37] hence, fuck ghostery [22:37] privacy badger is EFF [22:37] no such crap [22:37] :p [22:38] it also uses heuristics rather than blocklists [22:38] false positive rate is nonzero, but very very low [22:38] I'm thinking of just getting them a chromebook as my birthday present to myself. [22:38] lol [22:38] id like some proof on that claim about ghostery though, it looks like they have that as opt in [22:38] aaaaaaaaa: chromebooks are not immune to malware [22:38] the claim that they're collecting data [22:38] not the claim of who owns what [22:39] RedType: that may have been a recent change, but frankly, I would not trust a "keep marketing companies out" extension developed by a marketing company [22:39] regardless of opt in or opt out [22:39] massive conflict of interest [22:39] (the issue is pretty well documented from around a year or two ago) [22:40] also, aaaaaaaaa, you have installed unchecky, right? [22:40] *** xf2e has joined #archiveteam-bs [22:41] well documented but you made the claim. anyways, fwiw i use noscript instead of That Jazz [22:41] *** dashcloud has quit IRC (Remote host closed the connection) [22:42] never heard of unchecky, but they don't have admin privleges. Although that is more an ignorance defense than malware defense, it seems. [22:43] aaaaaaaaa: you'll want to install it regardless [22:43] it just auto-declines any bundled 'offers' (ie. malware) [22:43] *** dashcloud has joined #archiveteam-bs [22:43] RedType: yes, and too busy to dig it up right now :P [22:43] just wanted to give a quick tip [22:43] also wtf, a frying pan has gone missing [22:43] fair enough [22:44] also, one of the best defences against cryptolockers is offline backups or backups that dont allow you to over write previous revisions [22:45] that latter one is actually a difficult to come by solution for consumers [22:45] at least in an open source/free form [22:49] A friend at work has it set up so user folders are shared on his home network and then has another computer pull any changes over that way. [22:52] deaugh [22:53] *** dashcloud has quit IRC (Read error: Operation timed out) [22:58] *** dashcloud has joined #archiveteam-bs [23:25] *** Asparagir has joined #archiveteam-bs [23:26] *** w0rp has quit IRC (Read error: Operation timed out) [23:45] *** Jonimus has quit IRC (Ping timeout: 370 seconds) [23:56] *** Jonimus has joined #archiveteam-bs