#archiveteam-bs 2015-07-15,Wed

↑back Search

Time Nickname Message
00:00 🔗 xf2e has joined #archiveteam-bs
00:08 🔗 oldcad has quit IRC (Quit: Leaving.)
00:17 🔗 Aranje has quit IRC (Read error: Connection reset by peer)
00:17 🔗 Aranje has joined #archiveteam-bs
00:27 🔗 godane 2008 dailymail.co.uk sitemap urls are fully uploaded now
00:28 🔗 mistym has quit IRC (Remote host closed the connection)
00:35 🔗 Aranje has quit IRC (Ping timeout: 240 seconds)
00:41 🔗 BlueMaxim has joined #archiveteam-bs
00:42 🔗 mistym has joined #archiveteam-bs
00:44 🔗 kyan has quit IRC (Quit: This computer has gone to sleep)
00:47 🔗 Aranje has joined #archiveteam-bs
01:35 🔗 szalwia trying to archive a site that uses ajax for pagination. is there any way to make archivebot click on things in phantomjs mode?
01:36 🔗 kyan has joined #archiveteam-bs
01:36 🔗 godane i'm grabbing MontherboardTV youtube channel
01:37 🔗 godane its not that big it looks like
01:42 🔗 kyan I ta
01:42 🔗 kyan have to stop using google code. Eventually. I'm still using it all the time
01:58 🔗 mistym has quit IRC (Remote host closed the connection)
02:10 🔗 mistym has joined #archiveteam-bs
02:11 🔗 bzc6p_ has joined #archiveteam-bs
02:11 🔗 swebb sets mode: +o bzc6p_
02:14 🔗 bzc6p has quit IRC (Read error: Operation timed out)
02:35 🔗 Aranje has quit IRC (Read error: Connection reset by peer)
02:36 🔗 Aranje has joined #archiveteam-bs
02:38 🔗 xf2e So I'm interested in (and have been) backing up Quora
02:49 🔗 primus104 has quit IRC (Leaving.)
02:58 🔗 mistym has quit IRC (Remote host closed the connection)
03:08 🔗 Asparagir has joined #archiveteam-bs
03:09 🔗 robink has joined #archiveteam-bs
03:23 🔗 mistym has joined #archiveteam-bs
03:27 🔗 bzc6p_ is now known as bzc6p
03:27 🔗 bzc6p szalwia: AFAIK it doesn't. In such cases, I try to find out the number of pages from the source code and generate the URLs for the pages.
03:29 🔗 Asparagir has quit IRC (Asparagir)
03:30 🔗 szalwia bzc6p: no way to do this in my case, the ajax call returns javascript that then gets eval'd ;x
03:31 🔗 szalwia i wrote my own scraper that works, but it only gets me image urls https://gist.github.com/szalwia/d8658efd2f66a7584050
03:31 🔗 bzc6p If you analyze the JS code, you might find out what URLs are GETted with ajax
03:33 🔗 Aranje has quit IRC (Read error: Connection reset by peer)
03:34 🔗 Aranje has joined #archiveteam-bs
03:34 🔗 szalwia bzc6p: already did and as i said, they return additional javascript that gets eval()'d
03:35 🔗 Aranje has quit IRC (Read error: Connection reset by peer)
03:35 🔗 Aranje has joined #archiveteam-bs
03:36 🔗 bzc6p szalwia: what is the site, if I may ask? (Not because I don't believe you, just to see that bastard myself.)
03:37 🔗 szalwia bzc6p: ask.fm
03:38 🔗 Aranje has quit IRC (Read error: Connection reset by peer)
03:39 🔗 godane has quit IRC (Ping timeout: 306 seconds)
03:40 🔗 szalwia bzc6p: that's how the "View more" button on user profiles works
03:42 🔗 bzc6p I must be logged in to use that button?
03:44 🔗 szalwia bzc6p: no
03:44 🔗 szalwia http://ask.fm/ameeraaxxxxxxx
03:44 🔗 szalwia it's at the bottom of the page
03:44 🔗 bzc6p I see. It didn't worked for me, now I realized it needed cookies to work.
03:45 🔗 bzc6p A think I'll never understand.
03:45 🔗 bzc6p *thing
03:46 🔗 bzc6p It makes a POST request. Argh.
03:46 🔗 Aranje has joined #archiveteam-bs
03:48 🔗 godane has joined #archiveteam-bs
03:49 🔗 godane modem keeps going out
03:58 🔗 yipdw https://en.wikipedia.org/wiki/Resiniferatoxin hooooly shit
04:05 🔗 DFJustin countdown to youtube challenge
04:05 🔗 yipdw hah
04:07 🔗 mistym has quit IRC (Remote host closed the connection)
04:08 🔗 bzc6p szalwia: it's indeed awful. I still believe that someone with advanced knowledge in JS can find out what POST request is actually triggered. And it's probably also WARCable, at least I managed to capture some POSTy shit with webrecorder.io, so I guess it's possible. – If you don't mind, I don't go deeper into that awful JS, otherwise I'll go insane.
04:09 🔗 bzc6p szalwia: Also, ask.fm is a big site, thorough archiving would be a Warrior project one day when it's endangered. For archiving single profiles, you should try webrecorder.io, I did manage to archive e.g. Facebook pages with that
04:10 🔗 bzc6p so it may work with this one too.
04:13 🔗 szalwia https://webrecorder.io/ looks down from here
04:22 🔗 Asparagir has joined #archiveteam-bs
04:30 🔗 bzc6p szalwia: here too
04:32 🔗 bzc6p sets mode: +o chfoo
04:33 🔗 aaaaaaaaa works here
04:33 🔗 bzc6p sets mode: +oooo godane garyrh Infreq Kazzy
04:33 🔗 bzc6p sets mode: +oooo Kenshin midas Start wp494
04:33 🔗 bzc6p aaaaaaaaa: the front page does, but the archiving page gives 504
04:33 🔗 aaaaaaaaa ah ok, sorry for the confusion then.
04:34 🔗 bzc6p aaaaaaaaa: no, me sorry, it works for other sites,
04:34 🔗 bzc6p it just doesn't like ask.fm apparently
04:37 🔗 aaaaaaaaa has quit IRC (Leaving)
04:37 🔗 bzc6p Now the front page also 504s.
04:39 🔗 mistym has joined #archiveteam-bs
04:57 🔗 bzc6p has left
05:09 🔗 Ravenloft has quit IRC (Ping timeout: 240 seconds)
05:39 🔗 Start_ has joined #archiveteam-bs
05:39 🔗 Start has quit IRC (Read error: Connection reset by peer)
05:55 🔗 bsmith093 has quit IRC (Read error: Operation timed out)
06:09 🔗 Aranje has quit IRC (Read error: Connection reset by peer)
06:11 🔗 Start has joined #archiveteam-bs
06:12 🔗 Start_ has quit IRC (Read error: Connection reset by peer)
06:15 🔗 Aranje has joined #archiveteam-bs
06:16 🔗 Aranje has quit IRC (Read error: Connection reset by peer)
06:24 🔗 mistym has quit IRC (Remote host closed the connection)
07:10 🔗 Asparagir has quit IRC (Asparagir)
07:25 🔗 mistym has joined #archiveteam-bs
07:28 🔗 yipdw has quit IRC (Quit: No Ping reply in 180 seconds.)
07:30 🔗 yipdw has joined #archiveteam-bs
07:34 🔗 mistym has quit IRC (Read error: Operation timed out)
07:34 🔗 primus104 has joined #archiveteam-bs
07:40 🔗 schbirid has joined #archiveteam-bs
07:58 🔗 schbirid is now known as schbiridw
07:58 🔗 schbiridw is now known as schbiwork
08:06 🔗 toad2 has joined #archiveteam-bs
08:08 🔗 toad1 has quit IRC (Read error: Operation timed out)
08:29 🔗 primus104 has quit IRC (Leaving.)
09:41 🔗 dx has quit IRC (Read error: Operation timed out)
09:42 🔗 primus104 has joined #archiveteam-bs
09:54 🔗 BlueMaxim has quit IRC (Read error: Connection reset by peer)
10:09 🔗 dx has joined #archiveteam-bs
10:12 🔗 godane has quit IRC (Ping timeout: 265 seconds)
10:25 🔗 godane has joined #archiveteam-bs
10:57 🔗 godane has quit IRC (Quit: Leaving.)
11:06 🔗 godane has joined #archiveteam-bs
11:38 🔗 xf2e has quit IRC (Ping timeout: 483 seconds)
11:57 🔗 ohhdemgir has quit IRC (Read error: Connection reset by peer)
12:00 🔗 balrog has quit IRC (Read error: Operation timed out)
12:01 🔗 ripvanwin has quit IRC (Read error: Operation timed out)
12:02 🔗 ohhdemgir has joined #archiveteam-bs
12:05 🔗 dashcloud has quit IRC (Read error: Operation timed out)
12:08 🔗 balrog has joined #archiveteam-bs
12:08 🔗 swebb sets mode: +o balrog
12:11 🔗 dashcloud has joined #archiveteam-bs
12:46 🔗 Stilett0 has joined #archiveteam-bs
12:50 🔗 Stiletto has quit IRC (Ping timeout: 370 seconds)
13:11 🔗 primus104 has quit IRC (Leaving.)
13:26 🔗 godane SketchCow: a brief interview with the cuba SNET guy: http://media2.wptv.com/video/video_studio/2015/01/26/Jovenes_cubanos_construyeron_en_secreto__250724.mp4
13:27 🔗 godane we have video and words directly from him now
13:27 🔗 godane there are no subs sadly
13:43 🔗 Panasonic has joined #archiveteam-bs
13:43 🔗 Panasonic is now known as Ravenloft
13:44 🔗 Ravenloft https://retrogamingnr.wordpress.com/2015/07/06/17/
13:44 🔗 Ravenloft this is a nice write up
13:47 🔗 Ravenloft except when it isnt
13:51 🔗 Ravenloft the AVS by bunnyboy will be top class, he dismissed it, in a not elegant way, showing a pic of a proto encarnation of the HDMI solution that involved a top loader and was cancelled, it happened before the change to a brand new FPGA system, that is fully developed and in production
14:30 🔗 mistym has joined #archiveteam-bs
14:40 🔗 mistym has quit IRC (Remote host closed the connection)
15:07 🔗 mistym has joined #archiveteam-bs
16:08 🔗 primus104 has joined #archiveteam-bs
16:43 🔗 goekesmi_ has quit IRC (Remote host closed the connection)
16:45 🔗 Ravenloft has quit IRC (Ping timeout: 606 seconds)
16:50 🔗 mistym has quit IRC (Remote host closed the connection)
16:55 🔗 goekesmi has joined #archiveteam-bs
17:05 🔗 mistym has joined #archiveteam-bs
17:09 🔗 primus104 has quit IRC (Leaving.)
17:39 🔗 dashcloud has quit IRC (Read error: Operation timed out)
17:44 🔗 aaaaaaaaa has joined #archiveteam-bs
17:44 🔗 swebb sets mode: +o aaaaaaaaa
17:45 🔗 dashcloud has joined #archiveteam-bs
17:53 🔗 schbiwork nice, booting into rescue mode at oneprovider.com and port 22 is filtered
17:56 🔗 schbiwork oh wait, it just took a "while"
18:05 🔗 ripvanwin has joined #archiveteam-bs
18:29 🔗 dashcloud has quit IRC (Read error: Operation timed out)
18:35 🔗 dashcloud has joined #archiveteam-bs
18:43 🔗 dashcloud has quit IRC (Read error: Operation timed out)
18:59 🔗 dashcloud has joined #archiveteam-bs
19:02 🔗 primus104 has joined #archiveteam-bs
19:18 🔗 aaaaaaaaa has quit IRC (Leaving)
19:33 🔗 aaaaaaaaa has joined #archiveteam-bs
19:33 🔗 swebb sets mode: +o aaaaaaaaa
19:42 🔗 joepie91 for future reference, to mirror an entire github user (repos only): curl https://api.github.com/users/$USERNAME/repos | jq -r .[].clone_url | xargs -L 1 git clone --mirror
19:42 🔗 xmc nice
19:48 🔗 joepie91 xmc: jq is basically magic. :P
19:48 🔗 xmc yuuup
19:48 🔗 joepie91 but the docs are shit, so:
19:48 🔗 joepie91 -r means raw output, ie. no quote marks around the strings
19:48 🔗 xmc i used it the other day in a makefile to avoid writing a proper client for an api
19:48 🔗 joepie91 .[] means "for every item in the array"
19:49 🔗 joepie91 .propname gets the propname
19:49 🔗 joepie91 xmc: hehe
19:49 🔗 joepie91 honestly syntax is pretty simple, it just has really poor docs
19:49 🔗 joepie91 lol
19:49 🔗 xmc jq | curl | jq --exit-status '.success == true' && success-stuff
19:50 🔗 joepie91 oh, heh, clever
19:50 🔗 xmc yeah i was pretty pleased
19:50 🔗 joepie91 didn't know about exit-status
19:50 🔗 joepie91 but I can guess what it does
19:50 🔗 joepie91 "exit 0 if the expression is true"
19:50 🔗 xmc yep
19:50 🔗 joepie91 possibly also something like "or if it's a number, use that as exit code"
19:50 🔗 xmc i could have done just '.success' but i don't always trust truthiness
19:50 🔗 joepie91 (wild guess)
19:50 🔗 joepie91 yeah
19:50 🔗 joepie91 good call :P
19:51 🔗 joepie91 too many people do.. it's like one of the top 3 things I end up correcting
19:51 🔗 joepie91 when reviewing people's code
19:51 🔗 joepie91 "you should not do that. use == null instead"
19:51 🔗 joepie91 (it's related to truthiness)
19:53 🔗 DFJustin http://www.cs.utah.edu/~gk/atwork/
19:54 🔗 aaaaaaaaa has quit IRC (Leaving)
19:54 🔗 schbiwork curl https://api.github.com/users/"${USERNAME}"/repos | grep -Eo "https://github.com/"${USERNAME}"/.*\.git" | xargs -L 1 git clone --mirror
19:54 🔗 schbiwork :P
19:54 🔗 HCross schbiwork is it in their Paris location?
19:55 🔗 schbiwork HCross: yeah, currently we reached "Can we run some test on the server? This will cause a short downtime."
19:55 🔗 HCross Its basically Online.net rebranded
19:55 🔗 joepie91 schbiwork: good bit less reliable
19:55 🔗 HCross http://www.online.net/en/dedicated-server - go straight to the source
19:56 🔗 schbiwork HCross: i know i know but their offer was good
19:56 🔗 xmc schbiwork: you can also sed -e 's/"/\n/g' | grep '\.git$'
19:56 🔗 schbiwork :)
20:27 🔗 mistym has quit IRC (Remote host closed the connection)
20:35 🔗 dashcloud has quit IRC (Read error: Operation timed out)
20:38 🔗 dashcloud has joined #archiveteam-bs
20:40 🔗 mistym has joined #archiveteam-bs
21:00 🔗 mistym has quit IRC (Remote host closed the connection)
21:16 🔗 mistym has joined #archiveteam-bs
21:54 🔗 RichardG has quit IRC (Remote host closed the connection)
21:58 🔗 RichardG has joined #archiveteam-bs
22:04 🔗 kyan has quit IRC (Quit: This computer has gone to sleep)
22:28 🔗 aaaaaaaaa has joined #archiveteam-bs
22:28 🔗 swebb sets mode: +o aaaaaaaaa
22:32 🔗 aaaaaaaaa so I just discovered my second least favorite call to get from my family.
22:34 🔗 aaaaaaaaa "my computer is acting funny." "what were you doing before that?" "I went to a website and the screen was all red, so my neighbor said to try it with internet explorer."
22:35 🔗 joepie91 ;_;
22:35 🔗 RedType im hoping it was at least 10 or 11
22:35 🔗 joepie91 aaaaaaaaa: you're going to want to go over there right now if you can
22:35 🔗 joepie91 and ensure it's not a cryptolocker
22:36 🔗 aaaaaaaaa that is where I was
22:36 🔗 joepie91 ah ok
22:36 🔗 joepie91 also teach them to not ignore 'there's danger ahead' screens... >.>
22:36 🔗 joepie91 (because $5 that that was the "all red screen" they were complaining about)
22:36 🔗 RedType opendns + adblock everything + ghostery/disconnect to block oauth prompts
22:37 🔗 joepie91 RedType: fuck ghostery
22:37 🔗 joepie91 privacy badger
22:37 🔗 aaaaaaaaa yeah, the default web browser is chrome, so that is what I am thinking too.
22:37 🔗 joepie91 (ghostery is operated by a marketing company who collect and resell data on tracker blocking)
22:37 🔗 joepie91 (seriously)
22:37 🔗 RedType "this site redirected me to google? it's safe to log in to google!"
22:37 🔗 aaaaaaaaa seriously?
22:37 🔗 joepie91 yes, seriously
22:37 🔗 aaaaaaaaa heh, guess so
22:37 🔗 joepie91 hence, fuck ghostery
22:37 🔗 joepie91 privacy badger is EFF
22:37 🔗 joepie91 no such crap
22:37 🔗 joepie91 :p
22:38 🔗 joepie91 it also uses heuristics rather than blocklists
22:38 🔗 joepie91 false positive rate is nonzero, but very very low
22:38 🔗 aaaaaaaaa I'm thinking of just getting them a chromebook as my birthday present to myself.
22:38 🔗 joepie91 lol
22:38 🔗 RedType id like some proof on that claim about ghostery though, it looks like they have that as opt in
22:38 🔗 joepie91 aaaaaaaaa: chromebooks are not immune to malware
22:38 🔗 RedType the claim that they're collecting data
22:38 🔗 RedType not the claim of who owns what
22:39 🔗 joepie91 RedType: that may have been a recent change, but frankly, I would not trust a "keep marketing companies out" extension developed by a marketing company
22:39 🔗 joepie91 regardless of opt in or opt out
22:39 🔗 joepie91 massive conflict of interest
22:39 🔗 joepie91 (the issue is pretty well documented from around a year or two ago)
22:40 🔗 joepie91 also, aaaaaaaaa, you have installed unchecky, right?
22:40 🔗 xf2e has joined #archiveteam-bs
22:41 🔗 RedType well documented but you made the claim. anyways, fwiw i use noscript instead of That Jazz
22:41 🔗 dashcloud has quit IRC (Remote host closed the connection)
22:42 🔗 aaaaaaaaa never heard of unchecky, but they don't have admin privleges. Although that is more an ignorance defense than malware defense, it seems.
22:43 🔗 joepie91 aaaaaaaaa: you'll want to install it regardless
22:43 🔗 joepie91 it just auto-declines any bundled 'offers' (ie. malware)
22:43 🔗 dashcloud has joined #archiveteam-bs
22:43 🔗 joepie91 RedType: yes, and too busy to dig it up right now :P
22:43 🔗 joepie91 just wanted to give a quick tip
22:43 🔗 joepie91 also wtf, a frying pan has gone missing
22:43 🔗 RedType fair enough
22:44 🔗 RedType also, one of the best defences against cryptolockers is offline backups or backups that dont allow you to over write previous revisions
22:45 🔗 RedType that latter one is actually a difficult to come by solution for consumers
22:45 🔗 RedType at least in an open source/free form
22:49 🔗 aaaaaaaaa A friend at work has it set up so user folders are shared on his home network and then has another computer pull any changes over that way.
22:52 🔗 RedType deaugh
22:53 🔗 dashcloud has quit IRC (Read error: Operation timed out)
22:58 🔗 dashcloud has joined #archiveteam-bs
23:25 🔗 Asparagir has joined #archiveteam-bs
23:26 🔗 w0rp has quit IRC (Read error: Operation timed out)
23:45 🔗 Jonimus has quit IRC (Ping timeout: 370 seconds)
23:56 🔗 Jonimus has joined #archiveteam-bs

irclogger-viewer