Time |
Nickname |
Message |
00:08
🔗
|
|
icedice has quit IRC (Leaving) |
00:11
🔗
|
|
bitbit has quit IRC (Remote host closed the connection) |
00:14
🔗
|
|
Raccoon has joined #archiveteam-ot |
01:13
🔗
|
|
BlueMax has quit IRC (Quit: Leaving) |
03:30
🔗
|
|
dhyan_nat has joined #archiveteam-ot |
03:32
🔗
|
|
odemg has quit IRC (Ping timeout: 265 seconds) |
03:45
🔗
|
|
odemg has joined #archiveteam-ot |
04:10
🔗
|
|
BlueMax has joined #archiveteam-ot |
04:36
🔗
|
|
robogoat has quit IRC (Ping timeout: 255 seconds) |
04:58
🔗
|
|
robogoat has joined #archiveteam-ot |
05:12
🔗
|
|
wp494 has quit IRC (Ping timeout: 268 seconds) |
05:12
🔗
|
|
wp494 has joined #archiveteam-ot |
05:59
🔗
|
|
Mateon1 has quit IRC (Read error: Operation timed out) |
05:59
🔗
|
|
Mateon1 has joined #archiveteam-ot |
09:33
🔗
|
VoynichCr |
JAA: cool, found this searching "archives" https://twitter.com/archivesnext/lists/archives-on-twitter/members |
09:34
🔗
|
VoynichCr |
more https://twitter.com/loharasukidog/lists/archives-n-such/members |
09:54
🔗
|
VoynichCr |
we neeeeeed snscrape ircbot |
10:10
🔗
|
VoynichCr |
JAA: can you add scraping for twitter-list-members? |
10:46
🔗
|
|
killsushi has quit IRC (Quit: Leaving) |
12:16
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
13:37
🔗
|
|
Raccoon has quit IRC (Ping timeout: 265 seconds) |
13:37
🔗
|
|
Raccoon has joined #archiveteam-ot |
14:15
🔗
|
JAA |
VoynichCr: Yup. I'll probably make it return users, not Tweets. |
14:25
🔗
|
|
alex__ has joined #archiveteam-ot |
14:26
🔗
|
|
alex__ has quit IRC (Remote host closed the connection) |
14:26
🔗
|
|
alex__ has joined #archiveteam-ot |
14:36
🔗
|
|
Raccoon has quit IRC (Ping timeout: 252 seconds) |
14:48
🔗
|
|
alex__ has quit IRC (Ping timeout: 255 seconds) |
14:48
🔗
|
|
alex__ has joined #archiveteam-ot |
14:53
🔗
|
VoynichCr |
JAA: yeah |
15:01
🔗
|
|
alex__ has quit IRC (alex__) |
15:04
🔗
|
|
alex__ has joined #archiveteam-ot |
15:19
🔗
|
|
chirlu has quit IRC (Read error: Operation timed out) |
15:39
🔗
|
|
chirlu has joined #archiveteam-ot |
16:21
🔗
|
|
icedice has joined #archiveteam-ot |
16:37
🔗
|
|
DogsRNice has joined #archiveteam-ot |
17:17
🔗
|
|
alex__ has quit IRC (Quit: alex__) |
18:24
🔗
|
|
SuperCalc has joined #archiveteam-ot |
18:24
🔗
|
JAA |
Continuing from #archivebot |
18:24
🔗
|
SuperCalc |
yeah |
18:24
🔗
|
SuperCalc |
so is there any other options for what Im trying to do? |
18:25
🔗
|
JAA |
SuperCalc: I don't know enough about MtG/card trading to give you a real answer, but what you're looking for is a scraper, a tool that extract the information you're interested in from a website. |
18:25
🔗
|
JAA |
extracts* |
18:25
🔗
|
SuperCalc |
i see. |
18:25
🔗
|
SuperCalc |
do scrapers cost anything? |
18:27
🔗
|
JAA |
Well, complicated question. There are certainly commercial scraping softwares, but you can also write your own if you know how to code. Then you'll need a machine to actually run the scraping on, storage for the results, network, etc. You will likely also encounter that websites don't exactly like scrapers since it can generate significant server load. |
18:28
🔗
|
SuperCalc |
i see |
18:28
🔗
|
SuperCalc |
well, scryfall doesnt own the copyright to the cards they're databasing, they're an unofficial mtg site |
18:28
🔗
|
SuperCalc |
and i only would need to do the scraping once a year probably |
18:28
🔗
|
SuperCalc |
well, once a year except this year |
18:29
🔗
|
SuperCalc |
if scryfall has a problem with it they can just tell me to stop right? |
18:29
🔗
|
JAA |
If they know how to contact you, yes. |
18:29
🔗
|
JAA |
Or they'll just ban your IP etc. |
18:29
🔗
|
SuperCalc |
ah |
18:29
🔗
|
SuperCalc |
ill have to find out what their policy is on data scraping then |
18:30
🔗
|
JAA |
Is this of any use to you? https://scryfall.com/docs/api/bulk-data |
18:30
🔗
|
SuperCalc |
no |
18:30
🔗
|
SuperCalc |
the bulk data does not include pricing |
18:30
🔗
|
SuperCalc |
unfortunately |
18:30
🔗
|
JAA |
Ah |
18:30
🔗
|
SuperCalc |
it says that in their api documentation |
18:31
🔗
|
SuperCalc |
ill see if i can find out if their site says anything about data scraping |
18:31
🔗
|
SuperCalc |
hang on |
18:31
🔗
|
JAA |
The API itself does seem to have pricing information, and likely also has clear rate limits somewhere. |
18:31
🔗
|
JAA |
10 requests per second |
18:32
🔗
|
JAA |
And 429 responses if you exceed that. |
18:32
🔗
|
SuperCalc |
found something |
18:32
🔗
|
SuperCalc |
this was mentioned in their terms of service: |
18:32
🔗
|
SuperCalc |
You may not scrape Scryfall or solicit information from Scryfall users for the purposes of recruitment, head-hunting, or job boards |
18:33
🔗
|
SuperCalc |
but since i'm not doing this for any of those purposes, i think i should be fine |
18:33
🔗
|
SuperCalc |
question |
18:34
🔗
|
SuperCalc |
if i do use a data scraper, will i need more storage space to put it on? |
18:34
🔗
|
SuperCalc |
there are 10186 cards in the format |
18:34
🔗
|
JAA |
Uh, well, you will have to store the scraping results, so yes. |
18:34
🔗
|
JAA |
But it won't use much storage. |
18:34
🔗
|
SuperCalc |
how big are scraping results typically? |
18:34
🔗
|
SuperCalc |
will it fit comfortably on my laptop? |
18:34
🔗
|
JAA |
There's no way to answer that. |
18:34
🔗
|
SuperCalc |
hmm |
18:35
🔗
|
JAA |
It all depends on what you're scraping and how much data you're collecting. |
18:35
🔗
|
SuperCalc |
right |
18:35
🔗
|
SuperCalc |
well, if i limit it to just the card images and the card names of cards that are 15 cents or less |
18:35
🔗
|
JAA |
If you just need a card ID and a price, then that's maybe 20 bytes per card, so it would be under 1 MB before even compressing. |
18:35
🔗
|
SuperCalc |
ah then that should probably be good then |
18:36
🔗
|
SuperCalc |
although card id alone probably wont be enough, it would be better to have the card images and maybe the names too |
18:36
🔗
|
SuperCalc |
that way people can browse through the legal cards in the format more easily |
18:36
🔗
|
|
terorie has quit IRC (Ping timeout: 615 seconds) |
18:36
🔗
|
SuperCalc |
without having to constantly copy and paste a code |
18:36
🔗
|
SuperCalc |
into a card pricer |
18:37
🔗
|
SuperCalc |
or something |
18:37
🔗
|
SuperCalc |
sorry i mean the gatherer |
18:37
🔗
|
SuperCalc |
not card pricer |
18:37
🔗
|
SuperCalc |
since the prices would be just from the day of rotation |
18:37
🔗
|
SuperCalc |
misspoke |
18:38
🔗
|
SuperCalc |
found a data scraper app that doesnt require me to know how to code |
18:38
🔗
|
SuperCalc |
it was on a list of best free data scrapers |
18:38
🔗
|
JAA |
Images will obviously make it much larger. In any case, grab a small sample, e.g. 1 % or 1 ‰, and extrapolate from that. |
18:38
🔗
|
SuperCalc |
ok sounds good |
18:38
🔗
|
SuperCalc |
thanks! :) |
18:50
🔗
|
SuperCalc |
well |
18:51
🔗
|
SuperCalc |
ive read a bit more on the legal issues of data scraping |
18:51
🔗
|
SuperCalc |
and it suddenly seems like just because what i want to do is technically allowed by scryfall's tos |
18:51
🔗
|
SuperCalc |
doesnt mean the law will agree with me |
18:51
🔗
|
SuperCalc |
im going to have to send a message to scryfall and ask for permission |
18:52
🔗
|
|
Ryz has joined #archiveteam-ot |
18:52
🔗
|
Ryz |
...E3 2019 is upon us...and prepare to be disappointed and expect leaks lol |
19:27
🔗
|
SuperCalc |
what do you guys think of this message for their contact us form? |
19:28
🔗
|
SuperCalc |
or wait |
19:28
🔗
|
SuperCalc |
it doesnt work in irc chat nvm |
19:28
🔗
|
SuperCalc |
ill ask somewhere else |
20:08
🔗
|
Ryz |
Xbox E3 2019 conference started |
20:08
🔗
|
Ryz |
First, Outer Worlds; then Ninja Theory's new game, then Ori 2 |
20:11
🔗
|
|
icedice has quit IRC (Read error: Operation timed out) |
20:50
🔗
|
|
SuperCalc has quit IRC (Quit: Page closed) |
20:59
🔗
|
systwi |
GLaDOS: Thanks dude, will give it a go |
21:51
🔗
|
|
Stilettoo has joined #archiveteam-ot |
21:54
🔗
|
|
Stiletto has quit IRC (Ping timeout: 255 seconds) |
21:58
🔗
|
ivan |
now that we have qwarc can someone please just run it all over twitter |
22:28
🔗
|
Fusl |
:D |
22:42
🔗
|
|
dhyan_nat has quit IRC (Read error: Operation timed out) |
22:44
🔗
|
JAA |
Speaking of which, Fusl, how's your Twitter firehose plan doing? |
22:44
🔗
|
Fusl |
need dem twitter account ids |
22:45
🔗
|
JAA |
Have you talked to Jason Baumgartner yet? |
22:45
🔗
|
JAA |
I assume he continued working on it since writing that article. |
22:46
🔗
|
Fusl |
who is jason baumgartner |
22:46
🔗
|
Fusl |
oh that guy |
22:46
🔗
|
Fusl |
no i havent |
22:47
🔗
|
Fusl |
i'll get in contact with him |
22:47
🔗
|
Fusl |
on twitter |
22:47
🔗
|
Fusl |
:P |
22:47
🔗
|
JAA |
Another thing I'd like to do is collect metadata about Twitter users. |
22:47
🔗
|
JAA |
All of them, ideally. |
22:47
🔗
|
Fusl |
metadata? |
22:47
🔗
|
JAA |
How many tweets, followers, followees, etc. |
22:48
🔗
|
Fusl |
ah |
23:01
🔗
|
JAA |
Sigh, Twitter... |
23:01
🔗
|
|
BlueMax has joined #archiveteam-ot |
23:02
🔗
|
JAA |
snscrape twitter-search '#dkpol until:2018-08-09' just dies after 1007 results even though there are many, many older tweets. |
23:09
🔗
|
JAA |
Can reproduce it in Firefox as well. This is weird. |
23:11
🔗
|
JAA |
Guess I'll just continue with until:2018-08-08. :-/ |
23:13
🔗
|
|
wp494 has quit IRC (Read error: Operation timed out) |
23:15
🔗
|
|
wp494 has joined #archiveteam-ot |
23:54
🔗
|
|
Odd0002_ has joined #archiveteam-ot |
23:56
🔗
|
|
Odd0002 has quit IRC (Ping timeout: 252 seconds) |
23:56
🔗
|
|
Odd0002_ is now known as Odd0002 |