#archiveteam-bs 2018-09-22,Sat

↑back Search ←Prev date (last date) Show only urls(Click on time to select a line by its url)

WhoWhatWhen
***closure has quit IRC (Read error: Connection reset by peer)
closure has joined #archiveteam-bs
[00:01]
caff has quit IRC (Read error: Connection reset by peer) [00:10]
m007a83_ has joined #archiveteam-bs
m007a83 has quit IRC (Read error: Operation timed out)
schbirid has quit IRC (Read error: Operation timed out)
schbirid has joined #archiveteam-bs
[00:18]
m007a83_ is now known as m007a83
m007a83 has quit IRC (Quit: Fuck you Comcast)
m007a83 has joined #archiveteam-bs
[00:33]
..... (idle for 24mn)
ndiddy has quit IRC (Ping timeout: 252 seconds)
closure has quit IRC (Read error: Operation timed out)
schbirid has quit IRC (Read error: Operation timed out)
schbirid has joined #archiveteam-bs
closure has joined #archiveteam-bs
[00:57]
..... (idle for 20mn)
ndiddy has joined #archiveteam-bs
ndiddy has quit IRC (Client Quit)
odemg_ has quit IRC (Ping timeout: 268 seconds)
[01:23]
odemg_ has joined #archiveteam-bs
schbirid has quit IRC (Read error: Operation timed out)
schbirid has joined #archiveteam-bs
[01:38]
.... (idle for 19mn)
closure has quit IRC (Read error: Connection reset by peer)
closure has joined #archiveteam-bs
[02:00]
...... (idle for 29mn)
PhrackD has quit IRC (bye) [02:31]
godaneSketchCow: any news about me getting return labels? [02:31]
pikhqOne of the ironies of that "plan" is, I can't help but imagine that even people opposed to the CoC would be *really* pissed by someone trying that. [02:38]
***PhrackD has joined #archiveteam-bs [02:40]
.... (idle for 19mn)
closure has quit IRC (Read error: Connection reset by peer)
closure has joined #archiveteam-bs
[02:59]
joepie91_zyphlar: there are a few of these supposedly-hyper-neutral sites that inevitably become alt-right rags pretending to be 'fair and balanced'
in NL, we have The Post Online doing the same
they may have once had the stated purpose, but if so, then the people running the place apparently didn't understand that no, not all ideologies should be on equal footing
with this inevitable result :P
[03:10]
***davidar has joined #archiveteam-bs [03:16]
zyphlarYaaaay [03:20]
***Ctrl-S has joined #archiveteam-bs [03:20]
godane has quit IRC (Ping timeout: 252 seconds)
odemg_ has quit IRC (Ping timeout: 268 seconds)
[03:25]
odemg_ has joined #archiveteam-bs [03:39]
schbirid has quit IRC (Read error: Operation timed out)
schbirid has joined #archiveteam-bs
archodg__ has joined #archiveteam-bs
archodg_ has quit IRC (Ping timeout: 252 seconds)
odemg_ has quit IRC (Ping timeout: 268 seconds)
[03:44]
closure has quit IRC (Read error: Connection reset by peer)
closure has joined #archiveteam-bs
odemg_ has joined #archiveteam-bs
[04:00]
TC04 has joined #archiveteam-bs
TC01 has quit IRC (Read error: Operation timed out)
[04:11]
.... (idle for 15mn)
godane has joined #archiveteam-bs
svchfoo3 sets mode: +o godane
schbirid has quit IRC (Read error: Operation timed out)
schbirid has joined #archiveteam-bs
[04:27]
...... (idle for 27mn)
closure has quit IRC (Read error: Connection reset by peer)
closure has joined #archiveteam-bs
[04:59]
........... (idle for 54mn)
HCross has joined #archiveteam-bs [05:54]
closure has quit IRC (Read error: Connection reset by peer) [06:00]
...... (idle for 28mn)
closure_ has joined #archiveteam-bs [06:28]
closure has joined #archiveteam-bs
closure_ has quit IRC (Read error: Connection reset by peer)
bsmith093 has joined #archiveteam-bs
[06:38]
.... (idle for 16mn)
schbirid has quit IRC (Read error: Operation timed out)
schbirid has joined #archiveteam-bs
closure has quit IRC (Read error: Operation timed out)
[06:57]
................... (idle for 1h33mn)
schbirid has quit IRC (Read error: Operation timed out) [08:32]
PurpleSymVoynichCr: Do you have plans for adding other URLs to your bot? https://www.wikidata.org/wiki/Property:P3265 for example? [08:42]
There’s more social media accounts hidden behind P3040 P2002 P2397 P2003 and P2013. [08:48]
............... (idle for 1h13mn)
***chferfa has quit IRC () [10:01]
........ (idle for 37mn)
chferfa has joined #archiveteam-bs [10:38]
.................. (idle for 1h28mn)
BlueMax has quit IRC (Read error: Connection reset by peer) [12:06]
VoynichCrPurpleSym: i could, but the table can get too complicated, and searching for TW/FB/etc urls on ArchiveBot Viewer isn't as easy as website domains
suggestions are welcome
[12:11]
...... (idle for 25mn)
***eientei95 has joined #archiveteam-bs [12:37]
.... (idle for 16mn)
PurpleSymVoynichCr: You could rowspan the common lines (Name, Description, …) and have one line per URL for ArchiveBot and Archive details. Since nobody edits the table by hand this should be fine.
I think we lack the metadata to search for URL prefixes. You could search the IRC logs though.
[12:53]
JAAIndeed, but it's not easy to search for social media jobs in any case because we usually use !ao < jobs for that, not !a on the actual social media site. [12:57]
PurpleSymTrue, that’s an issue. Your metadata collection is not the right place for stuff like that, JAA? [12:58]
JAAPurpleSym: You mean my archivebot-archives tool? That would only find the URL of the URL list. Not the URLs inside that list.
Hmm, actually, the URL list is also on IA, so I guess it could be extended that way.
[13:01]
PurpleSymYep, that one. I didn’t know there were .json files with metadata for each grab.
Neat.
[13:04]
JAAYeah, the JSON file contains some very basic metadata on the job, and the -urls.txt file contains the URL list for !ao < and !a < jobs.
The JSON file is where the viewer gets the job URL from (well, if that weren't broken).
[13:04]
..... (idle for 24mn)
VoynichCrIA api shows where a grab come from? spider, archivebot... [13:29]
***wp494 has quit IRC (Read error: Operation timed out)
wp494 has joined #archiveteam-bs
[13:32]
JAAPurpleSym: Speaking of that tool, do you have any ideas how I could make it more useful? grepping a directory of ~16k YAML files works, I guess, but it's definitely not optimal. And a DB (e.g. sqlite3) wouldn't work well with version control.
I'd have to restructure it anyway if I wanted to integrate the information from JSON and URL lists in there.
[13:36]
I suppose I could create a fake DB using one YAML file per table. I wonder if that would actually be better though. [13:51]
PurpleSymJAA: Sure, sqlite is not optimal for git storage, but if you want search functionality it is definitely the way to go. [13:56]
JAAPurpleSym: Yeah. Or I could keep the data in a text format in the repo for git purposes and have an import script which inserts it into an SQLite DB for searches etc.
Having the actual DB in git could get nasty quickly due to size and diffs. The repo is already several hundred MiB in the most basic format possible.
[13:58]
PurpleSymSQL dump? [14:00]
....... (idle for 31mn)
***Pixi has quit IRC (Quit: Pixi) [14:31]
......... (idle for 43mn)
Jens has quit IRC (Remote host closed the connection)
Jens has joined #archiveteam-bs
closure_ has joined #archiveteam-bs
[15:14]
Pixi has joined #archiveteam-bs [15:20]
..... (idle for 21mn)
zhongfu has joined #archiveteam-bs [15:41]
.... (idle for 17mn)
closure_ has quit IRC (Read error: Connection reset by peer)
closure has joined #archiveteam-bs
[15:58]
............ (idle for 59mn)
closure has quit IRC (Read error: Connection reset by peer)
closure_ has joined #archiveteam-bs
[16:58]
....... (idle for 30mn)
arbin_ has joined #archiveteam-bs [17:28]
....... (idle for 32mn)
closure_ has quit IRC (Read error: Connection reset by peer) [18:00]
closure has joined #archiveteam-bs [18:08]
.... (idle for 15mn)
jmtd is now known as Jon [18:23]
.... (idle for 17mn)
closure has quit IRC (Read error: Connection reset by peer) [18:40]
........ (idle for 36mn)
closure has joined #archiveteam-bs [19:16]

↑back Search ←Prev date (last date) Show only urls(Click on time to select a line by its url)