#archiveteam 2018-02-23,Fri

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
***dashcloud has joined #archiveteam [00:02]
RichardG has quit IRC (Read error: Connection reset by peer)
RichardG has joined #archiveteam
[00:11]
....... (idle for 31mn)
dashcloud has quit IRC (Read error: Operation timed out) [00:43]
dashcloud has joined #archiveteam [00:48]
.... (idle for 19mn)
WubTheCap has joined #archiveteam
muramasa has joined #archiveteam
[01:07]
.............. (idle for 1h6mn)
bitBaron has quit IRC (Quit: Bye!)
Odd0002 has quit IRC (Ping timeout: 260 seconds)
[02:15]
Odd0002 has joined #archiveteam [02:25]
..... (idle for 21mn)
yuitimoth has quit IRC (Ping timeout: 250 seconds) [02:46]
........... (idle for 54mn)
nightpool has joined #archiveteam [03:40]
.... (idle for 18mn)
muramasa has quit IRC (Read error: Connection reset by peer)
muramasa has joined #archiveteam
espes__ has quit IRC (Ping timeout: 252 seconds)
espes__ has joined #archiveteam
djbeadle has joined #archiveteam
[03:58]
muramasa has quit IRC (Read error: Connection reset by peer) [04:20]
.... (idle for 18mn)
qw3rty111 has joined #archiveteam [04:38]
qw3rty119 has quit IRC (Read error: Operation timed out) [04:44]
.... (idle for 19mn)
octothorp has quit IRC (Remote host closed the connection)
octothorp has joined #archiveteam
[05:03]
muramasa has joined #archiveteam [05:11]
..... (idle for 20mn)
ivan has quit IRC (Leaving)
ivan has joined #archiveteam
djbeadle has quit IRC (My MacBook has gone to sleep.)
Jens has quit IRC (Remote host closed the connection)
Jens has joined #archiveteam
rsznick has joined #archiveteam
rsznik has quit IRC (Read error: Operation timed out)
fie has joined #archiveteam
[05:31]
....... (idle for 31mn)
RichardG has quit IRC (Ping timeout: 250 seconds) [06:18]
.... (idle for 15mn)
bithippo has joined #archiveteam [06:33]
...... (idle for 28mn)
zyphlar_ has joined #archiveteam [07:01]
bithippo has quit IRC (Quit: Page closed) [07:06]
....... (idle for 31mn)
bwn has quit IRC (Read error: Operation timed out) [07:37]
...... (idle for 27mn)
SketchCowhttps://github.com/N0taN3rd/Squidwarc
Anyone want to check out, report back
[08:04]
***bwn has joined #archiveteam [08:06]
PurpleSymUh, there’s chromebot in #archivebot. [08:15]
SketchCowYes
And does this have different features, etc
[08:19]
PurpleSymchromebot currently lacks support for recursive crawls. squidwarc has no IRC bot (config file based) and no custom behavior scripts (i.e. JavaScript injected into the page to perform some action).
Looks like they are using the same interface as chromebot to capture network requests (no proxy).
[08:25]
***Mateon1 has quit IRC (Ping timeout: 250 seconds)
Mateon1 has joined #archiveteam
Mateon1 has quit IRC (Connection closed)
[08:32]
.... (idle for 16mn)
Ctrl has joined #archiveteam [08:48]
atomotic has joined #archiveteam [09:01]
schbirid has joined #archiveteam [09:09]
...... (idle for 28mn)
JAA... which I still think is not the right way to tackle it. [09:37]
......... (idle for 40mn)
***z00nx has quit IRC (Ping timeout: 252 seconds)
mr_archiv has quit IRC (Ping timeout: 252 seconds)
[10:17]
mr_archiv has joined #archiveteam
z00nx has joined #archiveteam
[10:23]
........ (idle for 35mn)
atomotic has quit IRC (Quit: atomotic)
zyphlar_ has quit IRC (Quit: Connection closed for inactivity)
[10:58]
......... (idle for 41mn)
fie has quit IRC (Quit: Leaving) [11:40]
...... (idle for 25mn)
Ctrl has quit IRC (Read error: Operation timed out) [12:05]
....... (idle for 34mn)
atomotic has joined #archiveteam [12:39]
djbeadle has joined #archiveteam [12:51]
....... (idle for 30mn)
djbeadle has quit IRC (My MacBook has gone to sleep.) [13:21]
....... (idle for 30mn)
yuitimoth has joined #archiveteam [13:51]
..... (idle for 21mn)
RichardG has joined #archiveteam [14:12]
jrwrJAA: the thing is with "shitty" javascript webapps, rendered HTML might be best to scrape [14:18]
JAAjrwr: In my opinion, we should capture both the raw network traffic of everything retrieved by the browser as well as the resulting DOM. I'm not entirely sure how the latter should be stored though. It doesn't fit into the WARC model of requests and responses; maybe a new record type for such "derived" resources would be needed.
brozzler handles the network traffic side (and wpull + PhantomJS as well, I think). chromebot and Squidwarc are somewhere inbetween as far as I understand it (not the raw traffic because Chromium's API doesn't expose that -- transfer encoding is stripped, for example, and headers are in a standardised format rather than the raw bytes sent by the server -- but it's also not the final DOM).
archive.is would be an example for storing only the resulting DOM (I believe).
[14:23]
***atomotic has quit IRC (Quit: atomotic) [14:34]
Feld0 has joined #archiveteam
GLaDOS has joined #archiveteam
djbeadle has joined #archiveteam
[14:39]
........ (idle for 36mn)
Ctrl has joined #archiveteam [15:20]
Ctrl has quit IRC (Read error: Operation timed out) [15:33]
GLaDOS has quit IRC (Quit: Leaving) [15:44]
atomotic has joined #archiveteam [15:50]
.... (idle for 17mn)
atomotic has quit IRC (Quit: atomotic) [16:07]
.... (idle for 15mn)
Mateon1 has joined #archiveteam
Mateon1 has quit IRC (Connection closed)
[16:22]
....... (idle for 30mn)
Ctrl has joined #archiveteam [16:52]
...... (idle for 27mn)
SketchCowhttps://www.twitch.tv/textfilesdotcom
GET SOME LEARNING, FELLOWS
[17:19]
.......... (idle for 49mn)
***atomicthu has quit IRC (Read error: Operation timed out)
c4rc4s has quit IRC (Ping timeout: 600 seconds)
[18:08]
........ (idle for 38mn)
c4rc4s has joined #archiveteam [18:48]
..... (idle for 22mn)
atomicthu has joined #archiveteam [19:10]
......... (idle for 42mn)
jschwart has joined #archiveteam [19:52]
...... (idle for 25mn)
n00b559 has joined #archiveteam [20:17]
n00b559Hello, I lost a video in yahoo videos back in 2011 or so, can I find out how to find it on here? [20:19]
***RichardG has quit IRC (Read error: Connection reset by peer)
RichardG has joined #archiveteam
n00b559 has quit IRC (Ping timeout: 260 seconds)
[20:23]
....... (idle for 33mn)
bsmith093 has quit IRC (Quit: Leaving.)
bsmith093 has joined #archiveteam
[20:59]
...... (idle for 27mn)
ats has quit IRC (Quit: it's new kernel time!)
ats has joined #archiveteam
[21:29]
sekolyn has joined #archiveteam
octothorp has quit IRC (Read error: Connection reset by peer)
[21:45]
sekolyn has quit IRC (Read error: Connection reset by peer)
octothorp has joined #archiveteam
[21:52]
Asparagir has joined #archiveteam [21:59]
...... (idle for 28mn)
schbirid has quit IRC (Quit: Leaving)
dashcloud has quit IRC (Ping timeout: 260 seconds)
dashcloud has joined #archiveteam
dashcloud has quit IRC (Read error: Operation timed out)
dashcloud has joined #archiveteam
[22:27]
ola_norsk has joined #archiveteam
macker has quit IRC (Quit: out of beer...)
[22:51]
ola_norskis there anyone at Internet Archive that is responsible for IA social media precense? I'd like to hand over https://www.minds.com/InternetArchive
(i don't have faceboor or twitter :/ )
facebook*
[22:52]
***macker- has joined #archiveteam [22:54]
ola_norske.g some @archive.org mail i could send the u/p to? [22:54]
***macker- is now known as macker [22:54]
jschwart has quit IRC (Quit: Konversation terminated!) [23:01]
kitties has joined #archiveteam
RichardG has quit IRC (Read error: Connection reset by peer)
RichardG has joined #archiveteam
[23:07]
ola_norskola_norsk is just going to guess Alexis Rossi and CC the rest.. [23:15]
https://www.emailsherlock.com/emailsearch/jscott@archive.org/ [23:22]
JAAI'd say just email info@archive.org, asking them if they're interested etc. You'll either get a reply from the person you're looking for or instructions how to contact them. [23:24]
ola_norskholy crap, how that get pasted into the window??
JAA: ok
does irssi do drag-n-drop? :/
JAA: i'm just going to CC though..these day's one email address is usually not enough.
[23:28]
***ola_norsk has quit IRC (Wanting people to listen, you can't just tap them on the shoulder anymore. You have to hit them with a sledgehammer) [23:32]
BobJonkma has joined #archiveteam [23:40]
djbeadle has quit IRC (My MacBook has gone to sleep.) [23:49]
Stiletto has quit IRC (Ping timeout: 246 seconds)
Stilett0 has joined #archiveteam
[23:55]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)