#archiveteam-bs 2018-05-02,Wed

↑back Search

Time Nickname Message
00:02 🔗 pikhq has quit IRC (Ping timeout: 244 seconds)
00:08 🔗 BlueMax has joined #archiveteam-bs
00:09 🔗 pikhq has joined #archiveteam-bs
00:32 🔗 tomaspark has quit IRC (Read error: Operation timed out)
00:35 🔗 tomaspark has joined #archiveteam-bs
00:42 🔗 LordNigh2 has joined #archiveteam-bs
00:45 🔗 Lord_Nigh has quit IRC (Ping timeout: 252 seconds)
00:45 🔗 LordNigh2 is now known as Lord_Nigh
00:49 🔗 kisspunch has quit IRC (hub.efnet.us irc.efnet.nl)
00:49 🔗 mundus201 has quit IRC (hub.efnet.us irc.efnet.nl)
00:49 🔗 hook54321 has quit IRC (hub.efnet.us irc.efnet.nl)
00:49 🔗 kevinr has quit IRC (hub.efnet.us irc.efnet.nl)
00:49 🔗 MrRadar2 has quit IRC (hub.efnet.us irc.efnet.nl)
00:49 🔗 SketchCow has quit IRC (hub.efnet.us irc.efnet.nl)
00:49 🔗 BnAboyZ has quit IRC (hub.efnet.us irc.efnet.nl)
00:49 🔗 Tenebrae has quit IRC (hub.efnet.us irc.efnet.nl)
00:49 🔗 nyaomi has quit IRC (hub.efnet.us irc.efnet.nl)
00:49 🔗 Fusl has quit IRC (hub.efnet.us irc.efnet.nl)
00:49 🔗 tsr has quit IRC (hub.efnet.us irc.efnet.nl)
00:49 🔗 Sue has quit IRC (hub.efnet.us irc.efnet.nl)
00:49 🔗 w0rp has quit IRC (hub.efnet.us irc.efnet.nl)
00:49 🔗 Spydar007 has quit IRC (hub.efnet.us irc.efnet.nl)
00:49 🔗 BnARobin has quit IRC (hub.efnet.us irc.efnet.nl)
00:49 🔗 bsmith093 has quit IRC (hub.efnet.us irc.efnet.nl)
00:50 🔗 slyphic has quit IRC (Read error: Operation timed out)
00:51 🔗 slyphic has joined #archiveteam-bs
00:52 🔗 kisspunch has joined #archiveteam-bs
00:52 🔗 mundus201 has joined #archiveteam-bs
00:52 🔗 hook54321 has joined #archiveteam-bs
00:52 🔗 kevinr has joined #archiveteam-bs
00:52 🔗 MrRadar2 has joined #archiveteam-bs
00:52 🔗 SketchCow has joined #archiveteam-bs
00:52 🔗 BnAboyZ has joined #archiveteam-bs
00:52 🔗 Tenebrae has joined #archiveteam-bs
00:52 🔗 nyaomi has joined #archiveteam-bs
00:52 🔗 Fusl has joined #archiveteam-bs
00:52 🔗 tsr has joined #archiveteam-bs
00:52 🔗 Sue has joined #archiveteam-bs
00:52 🔗 w0rp has joined #archiveteam-bs
00:52 🔗 Spydar007 has joined #archiveteam-bs
00:52 🔗 BnARobin has joined #archiveteam-bs
00:52 🔗 bsmith093 has joined #archiveteam-bs
00:52 🔗 irc.efnet.nl sets mode: +ooo hook54321 MrRadar2 SketchCow
00:52 🔗 swebb sets mode: +o SketchCow
00:52 🔗 midas4 sets mode: +o SketchCow
00:52 🔗 midas1 sets mode: +o SketchCow
01:07 🔗 DMackey has joined #archiveteam-bs
01:17 🔗 balrog has quit IRC (Bye)
01:21 🔗 balrog has joined #archiveteam-bs
01:21 🔗 swebb sets mode: +o balrog
01:23 🔗 dashcloud has quit IRC (Read error: Operation timed out)
02:42 🔗 Jusque has quit IRC (Read error: Operation timed out)
03:55 🔗 qw3rty119 has joined #archiveteam-bs
03:55 🔗 odemg has quit IRC (Read error: Operation timed out)
04:00 🔗 odemg has joined #archiveteam-bs
04:01 🔗 Jusque has joined #archiveteam-bs
04:01 🔗 qw3rty118 has quit IRC (Read error: Operation timed out)
04:47 🔗 Lord_Nigh has quit IRC (Read error: Operation timed out)
04:50 🔗 Lord_Nigh has joined #archiveteam-bs
05:04 🔗 squires has quit IRC (Remote host closed the connection)
05:21 🔗 Lord_Nigh has quit IRC (Read error: Operation timed out)
05:24 🔗 Lord_Nigh has joined #archiveteam-bs
05:30 🔗 jmtd is now known as Jon
05:52 🔗 Mateon1 has quit IRC (Read error: Operation timed out)
05:52 🔗 Mateon1 has joined #archiveteam-bs
06:11 🔗 robogoat_ has quit IRC (Ping timeout: 252 seconds)
06:11 🔗 robogoat has joined #archiveteam-bs
06:23 🔗 LordNigh2 has joined #archiveteam-bs
06:23 🔗 Lord_Nigh has quit IRC (Ping timeout: 252 seconds)
06:23 🔗 LordNigh2 is now known as Lord_Nigh
06:26 🔗 Stilett0- has joined #archiveteam-bs
06:28 🔗 Stiletto has quit IRC (Ping timeout: 252 seconds)
06:28 🔗 Lord_Nigh has quit IRC (Read error: Operation timed out)
06:29 🔗 Lord_Nigh has joined #archiveteam-bs
06:36 🔗 Lord_Nigh has quit IRC (Read error: Operation timed out)
06:39 🔗 Lord_Nigh has joined #archiveteam-bs
07:10 🔗 LordNigh2 has joined #archiveteam-bs
07:10 🔗 Lord_Nigh has quit IRC (Ping timeout: 268 seconds)
07:11 🔗 LordNigh2 is now known as Lord_Nigh
07:13 🔗 schbirid has joined #archiveteam-bs
09:16 🔗 Lord_Nigh has quit IRC (Ping timeout: 252 seconds)
09:27 🔗 LordNigh2 has joined #archiveteam-bs
09:28 🔗 LordNigh2 is now known as Lord_Nigh
10:19 🔗 odemg has quit IRC (Read error: Operation timed out)
10:32 🔗 odemg has joined #archiveteam-bs
10:33 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
11:02 🔗 Lord_Nigh has quit IRC (Read error: Operation timed out)
11:03 🔗 Mateon1 has quit IRC (Remote host closed the connection)
11:03 🔗 Lord_Nigh has joined #archiveteam-bs
11:03 🔗 Mateon1 has joined #archiveteam-bs
11:09 🔗 ndiddy has quit IRC ()
11:12 🔗 Lord_Nigh has quit IRC (Ping timeout: 252 seconds)
11:16 🔗 Lord_Nigh has joined #archiveteam-bs
11:17 🔗 Mateon1 has quit IRC (Ping timeout: 255 seconds)
11:17 🔗 Mateon1 has joined #archiveteam-bs
11:21 🔗 Zexaron has joined #archiveteam-bs
11:32 🔗 odemg There is an ongoing important legal case involving what we do everyday, the lawyer of the defendant wants your opinions and insight into why we do what we do and the philosophy behind it all. Have your say here: https://redd.it/8gcji4
11:40 🔗 odemg https://www.reddit.com/r/DataHoarder/comments/8gcji4/the_philosophy_behind_data_hoarding_and_amateur/
12:15 🔗 plue has quit IRC (Ping timeout: 260 seconds)
12:23 🔗 plue has joined #archiveteam-bs
13:56 🔗 RichardG has quit IRC (Read error: Connection reset by peer)
13:57 🔗 RichardG has joined #archiveteam-bs
15:07 🔗 TC01 has quit IRC (Remote host closed the connection)
15:12 🔗 ebel has joined #archiveteam-bs
15:13 🔗 ebel JAA: I was a little worried that archivebot was only supposed to be for "Official(tm) Things"
15:22 🔗 JAA ebel: ArchiveBot is for whatever we throw at it. :-)
15:23 🔗 ebel ah, but who is we? :P
15:23 🔗 ebel I would be interested in your aforementioned scripts for Tw & FB.
15:25 🔗 JAA "We" = anyone with voice or ops in #archivebot. Now includes yourself.
15:26 🔗 ebel Oh :D Thanks.
15:30 🔗 JAA So, about the scraper: any name suggestions? That's the main thing that's stopping me from releasing it, honestly.
15:30 🔗 JAA It's supposed to be a generic social media scraper, so the name should reflect that. I suck at inventing names, so it's currently called "social-media-scraper", which is an awful name.
15:31 🔗 JAA It supports Twitter, Instagram, and (with limitations) Facebook so far.
15:37 🔗 ebel What's the opposite of "share", since it does the opposite of that, right? It saves, not shares :P
15:38 🔗 godane so i got anyfesto to sort of work
15:39 🔗 godane its part of my rpi3 archivebox project
15:39 🔗 godane i only got kiwix to work
15:40 🔗 godane i have to test if vlc radio would work later
15:44 🔗 JAA ebel: It doesn't "save" anything on its own, actually. It only *collects* the posts. At the moment, that just means extracting the link to each post, though additional data (e.g. post contents, author, date) could be added easily to make it a proper scraper. That's actually something I intend to do.
15:46 🔗 ebel I've started using grab-site. It's amazing how some websites that seem small, are actually pretty bit. Hundreds/thousands of things in the queue!
15:47 🔗 JAA That's tiny. It gets fun when there are millions in the queue. ;-)
15:47 🔗 ebel also, holy moley, but why the hell does facebook have so much JS!
15:48 🔗 ebel "https://static.xx.fbcdn.net/rsrc.php/blah.js"
15:48 🔗 JAA Fun fact: ArchiveBot was initially created for small crawls with 100 or 1000 pages. Nowadays, we hardly have any crawls with less than 100k URLs.
15:48 🔗 JAA Yes, Facebook sucks.
16:01 🔗 bwn has quit IRC (Read error: Operation timed out)
16:01 🔗 phuzion has quit IRC (Remote host closed the connection)
16:08 🔗 phuzion has joined #archiveteam-bs
16:11 🔗 bwn has joined #archiveteam-bs
16:34 🔗 wp494_ has joined #archiveteam-bs
16:37 🔗 wp494 has quit IRC (Ping timeout: 252 seconds)
16:40 🔗 Despatche has quit IRC (Ping timeout: 506 seconds)
16:41 🔗 Despatche has joined #archiveteam-bs
16:49 🔗 betamax has quit IRC (Ping timeout: 252 seconds)
17:06 🔗 betamax has joined #archiveteam-bs
17:38 🔗 Despatche has quit IRC (Quit: Read error: Connection reset by peer)
17:39 🔗 Despatche has joined #archiveteam-bs
18:52 🔗 ebel I'm playing with wpull, and it's amazing. It (& phatomjs & webrecorder player) are exactly what I had looked for ages! :D
18:53 🔗 ebel Is it possible to get it not to download all the analytics & ads on a page?
18:54 🔗 ebel (I'm not really sure how to do that, I want --page-requisits, but not the advert stuff. Maybe a phantomjs with an adblocker installed? )
18:57 🔗 JAA PhantomJS is ugly, and its integration in wpull has some issues (namely massive duplication in the archives). Look into IA's brozzler maybe (headless Chromium + warcprox).
18:57 🔗 JAA For ignoring in wpull, --reject-regex. Note that you can use that option only once, so you have to include all ignore patterns in one option.
18:59 🔗 ebel another option could be a web proxy which just blocks those sort of URLs. adblocking at a proxy level. Pretty sure I've seen them
19:00 🔗 JAA Also, if you want to use wpull, use version 1.2.3. Version 2.0.x is pretty unstable and has weird bugs.
19:00 🔗 ebel I have 1.2. It's what's in pip
19:00 🔗 JAA By the way, generally, we want to archive ads as well.
19:01 🔗 JAA Uh, pip install wpull should install 2.0.1.
19:01 🔗 ebel You might, but I don't. :D
19:01 🔗 ebel It's great that webrecorder player can work in my browser (which has an adblocker), so that might be a solution.
19:05 🔗 ebel I'm still impressed and blown away. I was looking for this sort of thing (on and off) for a little while. How did I miss this???
19:05 🔗 xmc :)
19:06 🔗 xmc wpull is an archiveteam-developed tool, it's good but kinda needs a rewrite and we don't really advertise it
19:11 🔗 JAA chfoo-developed* (except for a handful of commits). Credit where credit is due.
19:13 🔗 xmc aye, sorry
19:13 🔗 xmc i couldn't remember who
19:16 🔗 Kaz has quit IRC (Ping timeout: 260 seconds)
19:35 🔗 lindalap_ has joined #archiveteam-bs
19:35 🔗 lindalap has quit IRC (Write error: Connection reset by peer)
19:35 🔗 lindalap_ is now known as lindalap
19:49 🔗 jschwart has joined #archiveteam-bs
19:54 🔗 godane has quit IRC (Ping timeout: 257 seconds)
19:56 🔗 godane has joined #archiveteam-bs
19:56 🔗 svchfoo3 sets mode: +o godane
20:01 🔗 w00dsman has joined #archiveteam-bs
20:04 🔗 Kaz has joined #archiveteam-bs
20:04 🔗 Kaz ..oops
20:04 🔗 Kaz maybe shouldn't have deleted my znc host
20:08 🔗 w00dsman has quit IRC (WeeChat 2.1)
20:09 🔗 w00dsman has joined #archiveteam-bs
20:14 🔗 schbirid has quit IRC (Quit: Leaving)
20:26 🔗 BlueMax has joined #archiveteam-bs
20:39 🔗 godane has quit IRC (Read error: Operation timed out)
20:39 🔗 TC01 has joined #archiveteam-bs
20:41 🔗 godane has joined #archiveteam-bs
20:53 🔗 SimpBrain re: foodspotting
20:54 🔗 SimpBrain profiles can be found via numbers e.g. http://www.foodspotting.com/462800
20:54 🔗 SimpBrain reviews can be found via numbers e.g. http://www.foodspotting.com/reviews/6163336
20:55 🔗 SimpBrain places can be found via numbers too e.g. http://www.foodspotting.com/places/981555
20:55 🔗 SimpBrain :)
21:02 🔗 JAA SimpBrain: Can you try to figure out what the maximum number is for each of those please?
21:03 🔗 SimpBrain will do in a few mins
21:13 🔗 SimpBrain profiles last number: 5171135
21:14 🔗 SimpBrain reviews final number: 6163338
21:18 🔗 SimpBrain places final number: 1059662
21:18 🔗 SimpBrain profiles first number: 1
21:19 🔗 SimpBrain yeah all start with 1
21:34 🔗 RichardG has quit IRC (Read error: Operation timed out)
21:41 🔗 RichardG has joined #archiveteam-bs
21:48 🔗 wp494 has joined #archiveteam-bs
21:48 🔗 svchfoo3 sets mode: +o wp494
21:53 🔗 wp494_ has quit IRC (Ping timeout: 492 seconds)
21:53 🔗 odemg https://www.reddit.com/r/opendirectories/comments/8gl6eq/extensive_amiga_archive_theeyeeu
22:05 🔗 jschwart has quit IRC (Konversation terminated!)
22:38 🔗 godane has quit IRC (Ping timeout: 252 seconds)
22:54 🔗 godane has joined #archiveteam-bs
22:54 🔗 svchfoo3 sets mode: +o godane
22:59 🔗 godane has quit IRC (Ping timeout: 255 seconds)
23:06 🔗 godane has joined #archiveteam-bs
23:06 🔗 svchfoo3 sets mode: +o godane
23:34 🔗 tuluu has quit IRC (Read error: Operation timed out)

irclogger-viewer